SlideShare uma empresa Scribd logo
1 de 21
Building A directed graph with mongodb MongoSF 5/24/2011 By Tony Tam @fehguy
Who is wordnik Word + Meaning Discovery Engine Clustered Application built with: Scala/Java/Jetty Only way in is via REST 19M API calls/day @ 7ms/query average Physical servers 72GB RAM, 8 core 4.3TB DAS We’re MongoDB users for ~1.5 yrs Used in master/slave 14B documents in MongoDB
Why a graph for words Technique to model network relationships Properties are dynamic Links are “arbitrary” Runtime performance Answers in < 5ms/request Routing functions based on goals “find most likely word for X” “find more common form of Y”
Why a graph for words Misspellings, abbreviations, texting, Twitter
More about graphs Different types of Graphs Decisions have huge impact on design + implementation Nodes (vertices) String and numeric properties Edges (links) Finite set of labeled edge types (~30) Multiple target nodes per edge Each potentially different weight Directed, non-symmetrical
Why build on Mongodb? Word Graph is core to Wordnik Many ways to build a graph Dedicated graph DBs Relational DBs MongoDB Document Storage Uber-flexible Successfully routes in < 5ms Long runway for scale-out Limit storage infrastructure components Easy to implement
Wordnik graph data model Nodes _id field holds name, object type Index at no extra cost Arbitrary number of properties Only two datatypes for us, String, Double Node type info in node ID (_id) na_corpusCount => Double sa_source => String
Wordnik graph data model Edges Destination(s) Weight Link Properties Stored in Mongo Arrays Array size is app limited Use $push, $pop
Access to mongo Mongo Access via DAO layer Limit queries to ones that work“well” ALL queries use index Find Node “cat” of type “word”: db.node.findOne({_id:"cat|word"}) Find Edge types for above: db.edge.find({_id:/^catword/},{_id:1}) Serialization/deserialization  Done “the old-fashioned way” BasicDBObject, BasicDBList faster than mappers for our use case
Query efficiency Max execution time is  f (ahops)
Routing, traversals, functions Typically find path from A to B Routes have costs Low cost or high probability Our use case is atypical LinkedIn vs. Maps Not from A to B More like “from A with 3 hops” This matters!
Performance + Scaling
Performance + scaling Query by index only Use regex syntax in restricted fashion Starts with only No look behind Case sensitive Boring? Fast? Sharding is a no-brainer What about ObjectId()?
Performance + scaling Horizontal? Vertical?  Both?  And when? Separate collections by edge type/object type Increases storage needs Collections all have padding, 30 collections => ~30x padding Sharding Use slick, built-in Mongo sharding Roll your own based on your data What does Wordnik do? Neither! (yet) 30M Nodes, 50M Edges One collection for nodes One collection for edges
Performance + scaling Selecting a shard key Done in application logic based on OUR data Depends on what you need
End result Solves Wordnik Graph infrastructure needs Store Word nodes with UGC, corpus, structured, analytical data Batch fetch Edges @ > 50k/second Find Edge + endpoints in 80mS  Powers our… Word Selection Canonicalization Misspelling “Did you mean” logic Classification + Matching Engine
Examples Misspellings Abbreviations Lemmatization
Examples Term normalization Find similar words Meaning normalization Find “more common” form
examples Applied Word Graph Recall: “Computers are stupid” English is complex Clustering + classification algorithms: Stink without consistent data “The” => “the” (duh) “geese” => “goose” (ok) Stink when they’re slow Graph + Clustering/Classification Just add data
MongoDB makes a Great graph back-end See more about Wordnik APIs: http://developer.wordnik.com Further Reading Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam Source Code Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools Wordnik OSS Tools  https://github.com/wordnik/wordnik-oss
MongoDB makes a Great graph back-end Questions?

Mais conteúdo relacionado

Mais procurados

GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLSpark Summit
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQAraf Karsh Hamid
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
Patroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloudPatroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloudLucio Grenzi
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetAlexey Lesovsky
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기beom kyun choi
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overviewNeo4j
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...HostedbyConfluent
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 

Mais procurados (20)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Patroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloudPatroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloud
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
DDS In Action Part II
DDS In Action Part IIDDS In Action Part II
DDS In Action Part II
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
OrientDB
OrientDBOrientDB
OrientDB
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 

Semelhante a Building a Directed Graph with MongoDB

MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answersjeetendra mandal
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB
 
Jumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB AppJumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB AppMongoDB
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDBbwullems
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesHadi Ariawan
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011Chris Westin
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialPHP Support
 
MongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuMongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuSharan
 
Elasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonElasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonjeetendra mandal
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeRick Copeland
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011Chris Westin
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRick Copeland
 

Semelhante a Building a Directed Graph with MongoDB (20)

Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
nodejs.pptx
nodejs.pptxnodejs.pptx
nodejs.pptx
 
MongoDb - Details on the POC
MongoDb - Details on the POCMongoDb - Details on the POC
MongoDb - Details on the POC
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answers
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
 
Jumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB AppJumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB App
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDB
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
MongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_BabuMongoDB_Sharan_Prakash_Babu
MongoDB_Sharan_Prakash_Babu
 
Elasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonElasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparison
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and Python
 

Mais de Tony Tam

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksTony Tam
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with SwaggerTony Tam
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with SwaggerTony Tam
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorTony Tam
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerTony Tam
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Tony Tam
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Tony Tam
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-apiTony Tam
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startupsTony Tam
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at WordnikTony Tam
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing SwaggerTony Tam
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 
Managing a MongoDB Deployment
Managing a MongoDB DeploymentManaging a MongoDB Deployment
Managing a MongoDB DeploymentTony Tam
 
Keeping the Lights On with MongoDB
Keeping the Lights On with MongoDBKeeping the Lights On with MongoDB
Keeping the Lights On with MongoDBTony Tam
 

Mais de Tony Tam (20)

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with Swagger
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with Swagger
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger Inflector
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + Swagger
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-api
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startups
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at Wordnik
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing Swagger
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
Managing a MongoDB Deployment
Managing a MongoDB DeploymentManaging a MongoDB Deployment
Managing a MongoDB Deployment
 
Keeping the Lights On with MongoDB
Keeping the Lights On with MongoDBKeeping the Lights On with MongoDB
Keeping the Lights On with MongoDB
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Building a Directed Graph with MongoDB

  • 1. Building A directed graph with mongodb MongoSF 5/24/2011 By Tony Tam @fehguy
  • 2. Who is wordnik Word + Meaning Discovery Engine Clustered Application built with: Scala/Java/Jetty Only way in is via REST 19M API calls/day @ 7ms/query average Physical servers 72GB RAM, 8 core 4.3TB DAS We’re MongoDB users for ~1.5 yrs Used in master/slave 14B documents in MongoDB
  • 3. Why a graph for words Technique to model network relationships Properties are dynamic Links are “arbitrary” Runtime performance Answers in < 5ms/request Routing functions based on goals “find most likely word for X” “find more common form of Y”
  • 4. Why a graph for words Misspellings, abbreviations, texting, Twitter
  • 5. More about graphs Different types of Graphs Decisions have huge impact on design + implementation Nodes (vertices) String and numeric properties Edges (links) Finite set of labeled edge types (~30) Multiple target nodes per edge Each potentially different weight Directed, non-symmetrical
  • 6. Why build on Mongodb? Word Graph is core to Wordnik Many ways to build a graph Dedicated graph DBs Relational DBs MongoDB Document Storage Uber-flexible Successfully routes in < 5ms Long runway for scale-out Limit storage infrastructure components Easy to implement
  • 7. Wordnik graph data model Nodes _id field holds name, object type Index at no extra cost Arbitrary number of properties Only two datatypes for us, String, Double Node type info in node ID (_id) na_corpusCount => Double sa_source => String
  • 8. Wordnik graph data model Edges Destination(s) Weight Link Properties Stored in Mongo Arrays Array size is app limited Use $push, $pop
  • 9. Access to mongo Mongo Access via DAO layer Limit queries to ones that work“well” ALL queries use index Find Node “cat” of type “word”: db.node.findOne({_id:"cat|word"}) Find Edge types for above: db.edge.find({_id:/^catword/},{_id:1}) Serialization/deserialization Done “the old-fashioned way” BasicDBObject, BasicDBList faster than mappers for our use case
  • 10. Query efficiency Max execution time is f (ahops)
  • 11. Routing, traversals, functions Typically find path from A to B Routes have costs Low cost or high probability Our use case is atypical LinkedIn vs. Maps Not from A to B More like “from A with 3 hops” This matters!
  • 13. Performance + scaling Query by index only Use regex syntax in restricted fashion Starts with only No look behind Case sensitive Boring? Fast? Sharding is a no-brainer What about ObjectId()?
  • 14. Performance + scaling Horizontal? Vertical? Both? And when? Separate collections by edge type/object type Increases storage needs Collections all have padding, 30 collections => ~30x padding Sharding Use slick, built-in Mongo sharding Roll your own based on your data What does Wordnik do? Neither! (yet) 30M Nodes, 50M Edges One collection for nodes One collection for edges
  • 15. Performance + scaling Selecting a shard key Done in application logic based on OUR data Depends on what you need
  • 16. End result Solves Wordnik Graph infrastructure needs Store Word nodes with UGC, corpus, structured, analytical data Batch fetch Edges @ > 50k/second Find Edge + endpoints in 80mS Powers our… Word Selection Canonicalization Misspelling “Did you mean” logic Classification + Matching Engine
  • 18. Examples Term normalization Find similar words Meaning normalization Find “more common” form
  • 19. examples Applied Word Graph Recall: “Computers are stupid” English is complex Clustering + classification algorithms: Stink without consistent data “The” => “the” (duh) “geese” => “goose” (ok) Stink when they’re slow Graph + Clustering/Classification Just add data
  • 20. MongoDB makes a Great graph back-end See more about Wordnik APIs: http://developer.wordnik.com Further Reading Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam Source Code Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools Wordnik OSS Tools https://github.com/wordnik/wordnik-oss
  • 21. MongoDB makes a Great graph back-end Questions?