MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

•Transferir como PPTX, PDF•

1 gostou•4,807 visualizações

Presented by: John Barry, Senior Platform Developer, Apervita Experience level: Deep dive Graph databases are natural representation for many forms of data - from social network relationships, to music genres, to medical vocabulary data. Apervita’s challenge is importing and housing vast medical ontologies, and allowing efficient traversal of the graph retrieving both parent and child nodes across a variety of relationships. In this session, we will discuss the characteristics of graph databases, an implementation strategy for MongoDB, and an approach for implementing transitive closure for efficient traversal of the graph with limited database trips.

Tecnologia

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
Democratizing Health Analytics & Data

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
Building a Graph Database
in Mongo

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
3
Agenda
• What is a Graph Database?
• Let’s Build One!
• Example MongoDb Implementation
• Pitfalls
• How Apervita Uses Graphs

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
4
Graph Databases
Elements:
Edges
Nodes
Nodes and edges can also have properties

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
5
More on Graphs
• Graphs can have multiple types of edges
• Multiple types of graphs
• Directed Acyclic Graphs are our focus
• Directed have one-way relationships
• Relationships will not loop back to a higher node

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
6
Let’s build a Graf Database!
This is Steffi Graf
• Edges:
• Tournament Type
• Sponsorships
• Tournaments Won

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
7
Gran
d
Slam
French
Open
US
Open
Aus.
Open
Cano
n
Dunlo
p
Head
Adida
s
Wimbl
edon
Tourney Won
Tourney Type
Sponsorship

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
8
Graf is a Subgraph
• Implementations depend on what the overall purpose
of the graph is
• Tracking tournaments and results?

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
9
Graf is a Subgraph
• Tracking tennis players and their results?
Tournaments

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
10
Edges
Graf is a Subgraph
• Tracking tennis players and their results?

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
11
Players
Graf is a Subgraph
• Tracking tennis players and their results?

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
12
Graf is a Subgraph
• Just tracking Stefi Graf?
You
Stalkers
• You’ll need a new relationship:

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
13
Implementation of paths
•Multiple implementations of the graph exist:
•Parents only
•Children only
•Parents and Children
•Edges

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
14
Transitive Closure Mathematics
• Finding transitive closure with math
• Picard solved this in 1976. (not that Picard)
• But it’s polynomial time and the math is kind of hard

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
15
Implementing Paths
• Solution:
• Implement paths in their own collection.

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
16

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
17
Considerations
• As graphs get larger, performance becomes an
issue
• Extremely wide child models can break the
document size

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
18
How Apervita Uses Graphs
• Hospital Datasets are dynamic and varied
• Medical Vocabularies are by their nature graphs
• SNOMED, ICD-9, ICD-10, RxNorm
• The ability to code an algorithm to the “generic” and
subsume for the specific is extremely powerful

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
19
How Apervita Uses Graphs
Example of browsing RxNorm

COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
20
In Summary
• Think about your data model
• Use MongoDb’s indexing power to have quick
access to your paths for graph calculations
• Think about your overall expected graph size

Mais conteúdo relacionado

Semelhante a MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

Beyond Cost Savings and Discounts - How to Get Real Value from Your SuppliersSAP Ariba

Predictive Analytics: Better Commerce Insight | Ariba LIVE RomeSAP Ariba

Gaining More Value from your Ariba Network ConnectionSAP Ariba

Maturing the StartupNew Relic

Leveraging Ariba Discovery for Supplier Identification and VettingSAP Ariba

Turn Your P2P Function into a Strategic Profit CenterSAP Ariba

Sitecore Symposium 2018 - Supercharge Your Author Experience With Machine Lea...Mark Stiles

Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY

Supply Chain Collaboration with Direct Materials Suppliers: Manitowoc Shares ...SAP Ariba

Applying Procurement Best Practices to Travel & Entertainment ReportingSAP Ariba

Ariba and Fieldglass IntegrationSAP Ariba

Building an E-Commerce TeamSAP Ariba

SAP Strata-Hadoop Asia Pacific Breakout Presentation, December, 2015Paul Marriott

Automating Your Transactions on the Ariba NetworkSAP Ariba

Leverage Fieldglass for Services Procurement & Contingent Workforce ManagementSAP Ariba

Best Practice in Content ManagementSAP Ariba

How Data Scientists WorkPeter Molnar

How to Get Your Life Back: Succeeding at Software Asset Management (SAM) at F...CA Technologies

The Power of SAP and Ariba Solution IntegrationSAP Ariba

Procurement 2015: Benchmarking the Best-in-ClassSAP Ariba

Semelhante a MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB (20)

Beyond Cost Savings and Discounts - How to Get Real Value from Your Suppliers

Predictive Analytics: Better Commerce Insight | Ariba LIVE Rome

Gaining More Value from your Ariba Network Connection

Maturing the Startup

Leveraging Ariba Discovery for Supplier Identification and Vetting

Turn Your P2P Function into a Strategic Profit Center

Sitecore Symposium 2018 - Supercharge Your Author Experience With Machine Lea...

Data Architecture - The Foundation for Enterprise Architecture and Governance

Supply Chain Collaboration with Direct Materials Suppliers: Manitowoc Shares ...

Applying Procurement Best Practices to Travel & Entertainment Reporting

Ariba and Fieldglass Integration

Building an E-Commerce Team

SAP Strata-Hadoop Asia Pacific Breakout Presentation, December, 2015

Automating Your Transactions on the Ariba Network

Leverage Fieldglass for Services Procurement & Contingent Workforce Management

Best Practice in Content Management

How Data Scientists Work

How to Get Your Life Back: Succeeding at Software Asset Management (SAM) at F...

The Power of SAP and Ariba Solution Integration

Procurement 2015: Benchmarking the Best-in-Class

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Último

A Domino Admins Adventures (Engage 2024)Gabriella Davis

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Real Time Object Detection Using Open CVKhem

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Artificial Intelligence: Facts and MythsJoaquim Jorge

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

3. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 3 Agenda • What is a Graph Database? • Let’s Build One! • Example MongoDb Implementation • Pitfalls • How Apervita Uses Graphs

4. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 4 Graph Databases Elements: Edges Nodes Nodes and edges can also have properties

5. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 5 More on Graphs • Graphs can have multiple types of edges • Multiple types of graphs • Directed Acyclic Graphs are our focus • Directed have one-way relationships • Relationships will not loop back to a higher node

6. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 6 Let’s build a Graf Database! This is Steffi Graf • Edges: • Tournament Type • Sponsorships • Tournaments Won

7. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 7 Gran d Slam French Open US Open Aus. Open Cano n Dunlo p Head Adida s Wimbl edon Tourney Won Tourney Type Sponsorship

8. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 8 Graf is a Subgraph • Implementations depend on what the overall purpose of the graph is • Tracking tournaments and results?

9. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 9 Graf is a Subgraph • Tracking tennis players and their results? Tournaments

12. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 12 Graf is a Subgraph • Just tracking Stefi Graf? You Stalkers • You’ll need a new relationship:

13. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 13 Implementation of paths •Multiple implementations of the graph exist: •Parents only •Children only •Parents and Children •Edges

14. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 14 Transitive Closure Mathematics • Finding transitive closure with math • Picard solved this in 1976. (not that Picard) • But it’s polynomial time and the math is kind of hard

15. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 15 Implementing Paths • Solution: • Implement paths in their own collection.

17. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 17 Considerations • As graphs get larger, performance becomes an issue • Extremely wide child models can break the document size

18. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 18 How Apervita Uses Graphs • Hospital Datasets are dynamic and varied • Medical Vocabularies are by their nature graphs • SNOMED, ICD-9, ICD-10, RxNorm • The ability to code an algorithm to the “generic” and subsume for the specific is extremely powerful

20. COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 20 In Summary • Think about your data model • Use MongoDb’s indexing power to have quick access to your paths for graph calculations • Think about your overall expected graph size

Notas do Editor

e.g. distance between two cities (edge) population of city (node)
Steffi held the #1 rank for 377 weeks total, longer than any tennis player male or female She’s won 22 Grand Slam singles titles, which is the most held by any player male or female since the Open Era started in 1968 She’s won a Golden Slam, winning all 4 grand slam tournaments AND the olympic gold medal in a single year, and is the only player in the history of tennis to do so And her name is a homonym for graph, which is how I ended up learning all the previous facts
All tournaments per year should get their own node Participants, winners, scores
Tracking tennis players and their results? Tournaments may be independent nodes, one per tournament Property of Years Won can reside on the edge (as this is per player
Tournaments can actually have the property of years won
All of these have one major issue: walking the graph is computing intensive and results in many queries to traverse the graph
One path list per relationship/unique path Indexing the path array leads to speedy results This will allow (with some post-processing): Transitive Closure Reachability becomes a trivial query query example
As graphs get larger, performance becomes an issue RXNORM 204,000 SNOMED 311,000 ICD10 PCS 76000 Sharding on edge id + graph id Collection per graph id Extremely wide child models can break the document size Implementing pseudo subnodes can alleviate this pressure
Apervita allows medical authors to code to common datasets Hospitals aren’t forced to conform their data, but can take advantage of our mapping tools to execute algorithms on their own data Medical Vocabularies are by their nature graphs SNOMED, ICD-9, ICD-10, RxNorm The ability to code an algorithm to the “generic” and subsume for the specific is extremely powerful e.g. Coding to Diabetes Mellitus (SNOMED ID 73211009) can capture all subnodes such as gestational diabetes, diabetes type II, diabetes type I
Think about your data model What are your nodes? What are your edges? What should be a property vs a node? Use MongoDb’s indexing power to have quick access to your paths for graph calculations Think about your overall expected graph size Does the width imply faux nodes? Should you shard? If multiple subgraphs, should you separate into individual collections?

MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

Semelhante a MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

Notas do Editor