SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
SPONSORED BY THE NATIONAL CANCER INSTITUTE
And then there were 15
standards
Using Neo4j to harmonize data in cancer
research
Todd Pihl, Ph.D.
Mark Jensen, Ph.D.
https://xkcd.com/927
Biological data is naturally a graph
Graph management by subject matter experts
Node
s
Edge
s
Propert
y
Defs
Props referenced here … and defined
here
Entity names are the
keys
Nodes at the
ends,
with direction
Other attributes
specified
Constrain the
data values
to defined
types
Model Description Files
https://github.com/CBIIT/bento-mdf
Bento Framework
Installing a Bento Data Sharing Platform on a Cloud Platform
LOCAL
MACHINE
GITHUB
CLOUD
PLATFORM
Clone files
from
GitHub
Frontend
Backend
Neo4J
-Add test meta data to DB
-Edit UI config files
-View updates in real-time
-Save updated files in bento-frontend
-Push to Git Hub
bento-frontend
bento-backend
bento-data-model
bento-frontend
bento-backend
bento-data-model
Pull updated files
from GitHub
Load data from a
secure S3
bucket
Frontend
Backend
Neo4J
Data Sharing Platform
AWS Environment
Cancer Research Data Commons (CRDC)
Cancer Data Aggregator
Aggregate by patient, sample, study, disease, tissue, etc.
Clinical Proteomics Imaging
Genomics Immuno-
oncology
Animal
Models
Cancer
Biomarkers
Cancer
Research
Data Commons
0100111
0
0100001
1
0100100
1
Data Standards Services
Cancer Data Aggregator (CDA)
• CDA Mission: Provide a single location to query across all CRDC data repositories
• API, Python library
• Currently contains data from Genomics, Proteomics and Imaging Data Commons
• Remaining CRDC data repositories in progress
• Released for CRDC production use on June 28th
• Documentation: https://cda.readthedocs.io/en/latest/
• The Examples page has many Python use cases
• CDA Github: https://github.com/CancerDataAggregator
• Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html
• For the first time, CDA allows us to easily look across CRDC at how data are presented to
users.
Houston, we have a problem
Example: Species
Are these fields really the same?
12
Models are for data, not vice versa.
13
Models are for data, not vice versa.
CRDC is a federation of going concerns
• Each CRDC node has its own data systems, business processes, stakeholders,
and users
• Each has its own purpose-built data model that enables data ingestion, query, and distribution.
• Each has large, ongoing inflows and outflows of data today.
• So – A top-down, prescriptive approach to standardization is not feasible.
(Believe us; we know.)
• Standardization emphasizing carrots instead of sticks:
• Access to the CDA is a benefit for any node wanting to extend the reach of its data.
• Approach data standardization as a practical mapping goal: “If you can place your model in the
context of the CDA’s data maps, the CDA can query and serve your data”
• Approach standardization as an iterative process: “Start with a high priority set of metadata, and
expand mapping over time.”
Graphs as a common language for expressing data models
Property Graph Relational Data OWL/RDF
Node Table rows Class
Property Table columns/cells Datatype Property
Relationship Foreign keys/Linking tables Object Property
Representing custom data models as graphs can provide:
• a unified context for managing data and semantics, and
• a framework for integrating data with minimal impact on repository operations.
Creating graph versions of many kinds of data models is possible, since many
popular modeling approaches find natural expression in the Property Graph:
Model Description Format (MDF) - simple, iterative model
recording and schematizing
MDF is a compact, human-readable—and computable—format for defining a
property graph:
• Define Nodes
• Node Properties
• Define Relationships
• Relationship Properties
• Relationship Attributes
• Define and Describe Properties
• Property Attributes, including
• Allowable value types or sets
https://github.com/CBIIT/bento-md
f
In the Bento framework:
• Data SMEs directly update MDF (in GitHub) to make model updates
• Backend data loader and frontend user interfaces are configured directly by MDF
MDF is simple and standardized
17
Philip Musk 12:06
And let me tell you, with data needs driving many
of ICDC's requirements as they are, and have
been thus far, being able to both write the
requirements, and make the required model
changes ahead of engineers doing their thing, is
really powerful. I don't have to explain what
model changes we need to make to someone else
- I can get the model changes done myself, and
explain what we need the engineers and the UI to
do with those changes.
SMEs
Engineering
• Practical principles towards a practical goal led us to practical tools, enabling
• Rapid prototypes and production tier commons
• Integrated Canine Data Commons
• Clinical Trial Data Commons
• Rapid prototypes for data modeling and model visualization
• Cancer Data Service
• Children’s Cancer Data Initiative
• New practical problem: management of multiple dynamic data models over
independent projects
• Creating new models: component reuse?
• Managing acceptable value sets for many Properties in models
• Understanding interrelationships between models for mapping and interoperability
Metamodel Database – the models as data
18
Both data and model as property graphs
Data
Model
("Schema")
Label:
Person
Label:
Person
Label:
Group
Metamodel Schema
20
Defines:
• Models
• Nodes, Relationships, Properties
• Origins, Terms, and Value Sets
• Concepts and Predicates
Schema is represented in MDF
https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
Two models in an Metamodel DB (MDB)
21
ICDC CTDC
• In the simple context of Properties, Nodes, and Relationships, we have a
functional repository for multiple graph models
• Python packages move MDF into an MDB, create MDF from models in an MDB
• Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB
• Based directly on Neo4j Community server
images
• Simple Terminology Server (STS) with MDB
as backend
• Enables both GUI and API access to the
models
• Model browsing and fulltext search across
all entities
• STS is also intended to be easy to
distribute and set up
MDB as a model repository and reference
The MDB schema also defines entities for relating models to one another
and to external authorities:
• Concepts & Predicates (“semantics”)
• Origins, Terms, & Value Sets (“terminology”)
Patterns for connecting these to model entities
create separable “layers” that can be added
or modified without disrupting the repository
function.
MDB as a cross-model tool
23
24
• Dynamic
• Like data and data models
• Pragmatic
• Not a repository of ultimate truth
• Tool to help us provide value to NCI today
• Friendly
• Communicates to humans and computers
• Simple, but well-defined
• Not necessarily exhaustive or “complete”
• Distributable
• Not necessarily “central”
• A platform for “mutual understanding” of data
MDB Philosophy: keys to its utility
25
https://cbiit.github.io/bento-meta/mdb-principles.html
• Mark Benson, PhD
• Phil Musk, PhD
• Ming Ying, MS
• Anjan Purkayastha, PhD
• Ye Wu, PhD
• Pat Dunn, PhD
• Nelson Moore, MS
• John Otridge, PhD
Acknowledgements
26

Mais conteúdo relacionado

Semelhante a Government GraphSummit: And Then There Were 15 Standards

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideRob Worthington
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSNicolas Georgeault
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzurePrecisely
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligenceAhsan Kabir
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 

Semelhante a Government GraphSummit: And Then There Were 15 Standards (20)

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divide
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligence
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 

Mais de Neo4j

Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignNeo4j
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Neo4j
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...Neo4j
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionNeo4j
 
SKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesSKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesNeo4j
 

Mais de Neo4j (20)

Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by Design
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ion
 
SKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesSKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologies
 

Último

Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 

Último (20)

20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 

Government GraphSummit: And Then There Were 15 Standards

  • 1. SPONSORED BY THE NATIONAL CANCER INSTITUTE And then there were 15 standards Using Neo4j to harmonize data in cancer research Todd Pihl, Ph.D. Mark Jensen, Ph.D.
  • 3. Biological data is naturally a graph
  • 4. Graph management by subject matter experts Node s Edge s Propert y Defs Props referenced here … and defined here Entity names are the keys Nodes at the ends, with direction Other attributes specified Constrain the data values to defined types Model Description Files https://github.com/CBIIT/bento-mdf
  • 6. Installing a Bento Data Sharing Platform on a Cloud Platform LOCAL MACHINE GITHUB CLOUD PLATFORM Clone files from GitHub Frontend Backend Neo4J -Add test meta data to DB -Edit UI config files -View updates in real-time -Save updated files in bento-frontend -Push to Git Hub bento-frontend bento-backend bento-data-model bento-frontend bento-backend bento-data-model Pull updated files from GitHub Load data from a secure S3 bucket Frontend Backend Neo4J Data Sharing Platform AWS Environment
  • 7. Cancer Research Data Commons (CRDC) Cancer Data Aggregator Aggregate by patient, sample, study, disease, tissue, etc. Clinical Proteomics Imaging Genomics Immuno- oncology Animal Models Cancer Biomarkers Cancer Research Data Commons 0100111 0 0100001 1 0100100 1 Data Standards Services
  • 8. Cancer Data Aggregator (CDA) • CDA Mission: Provide a single location to query across all CRDC data repositories • API, Python library • Currently contains data from Genomics, Proteomics and Imaging Data Commons • Remaining CRDC data repositories in progress • Released for CRDC production use on June 28th • Documentation: https://cda.readthedocs.io/en/latest/ • The Examples page has many Python use cases • CDA Github: https://github.com/CancerDataAggregator • Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html • For the first time, CDA allows us to easily look across CRDC at how data are presented to users.
  • 9. Houston, we have a problem
  • 11. Are these fields really the same?
  • 12. 12 Models are for data, not vice versa.
  • 13. 13 Models are for data, not vice versa.
  • 14. CRDC is a federation of going concerns • Each CRDC node has its own data systems, business processes, stakeholders, and users • Each has its own purpose-built data model that enables data ingestion, query, and distribution. • Each has large, ongoing inflows and outflows of data today. • So – A top-down, prescriptive approach to standardization is not feasible. (Believe us; we know.) • Standardization emphasizing carrots instead of sticks: • Access to the CDA is a benefit for any node wanting to extend the reach of its data. • Approach data standardization as a practical mapping goal: “If you can place your model in the context of the CDA’s data maps, the CDA can query and serve your data” • Approach standardization as an iterative process: “Start with a high priority set of metadata, and expand mapping over time.”
  • 15. Graphs as a common language for expressing data models Property Graph Relational Data OWL/RDF Node Table rows Class Property Table columns/cells Datatype Property Relationship Foreign keys/Linking tables Object Property Representing custom data models as graphs can provide: • a unified context for managing data and semantics, and • a framework for integrating data with minimal impact on repository operations. Creating graph versions of many kinds of data models is possible, since many popular modeling approaches find natural expression in the Property Graph:
  • 16. Model Description Format (MDF) - simple, iterative model recording and schematizing MDF is a compact, human-readable—and computable—format for defining a property graph: • Define Nodes • Node Properties • Define Relationships • Relationship Properties • Relationship Attributes • Define and Describe Properties • Property Attributes, including • Allowable value types or sets https://github.com/CBIIT/bento-md f
  • 17. In the Bento framework: • Data SMEs directly update MDF (in GitHub) to make model updates • Backend data loader and frontend user interfaces are configured directly by MDF MDF is simple and standardized 17 Philip Musk 12:06 And let me tell you, with data needs driving many of ICDC's requirements as they are, and have been thus far, being able to both write the requirements, and make the required model changes ahead of engineers doing their thing, is really powerful. I don't have to explain what model changes we need to make to someone else - I can get the model changes done myself, and explain what we need the engineers and the UI to do with those changes. SMEs Engineering
  • 18. • Practical principles towards a practical goal led us to practical tools, enabling • Rapid prototypes and production tier commons • Integrated Canine Data Commons • Clinical Trial Data Commons • Rapid prototypes for data modeling and model visualization • Cancer Data Service • Children’s Cancer Data Initiative • New practical problem: management of multiple dynamic data models over independent projects • Creating new models: component reuse? • Managing acceptable value sets for many Properties in models • Understanding interrelationships between models for mapping and interoperability Metamodel Database – the models as data 18
  • 19. Both data and model as property graphs Data Model ("Schema") Label: Person Label: Person Label: Group
  • 20. Metamodel Schema 20 Defines: • Models • Nodes, Relationships, Properties • Origins, Terms, and Value Sets • Concepts and Predicates Schema is represented in MDF https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
  • 21. Two models in an Metamodel DB (MDB) 21 ICDC CTDC
  • 22. • In the simple context of Properties, Nodes, and Relationships, we have a functional repository for multiple graph models • Python packages move MDF into an MDB, create MDF from models in an MDB • Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB • Based directly on Neo4j Community server images • Simple Terminology Server (STS) with MDB as backend • Enables both GUI and API access to the models • Model browsing and fulltext search across all entities • STS is also intended to be easy to distribute and set up MDB as a model repository and reference
  • 23. The MDB schema also defines entities for relating models to one another and to external authorities: • Concepts & Predicates (“semantics”) • Origins, Terms, & Value Sets (“terminology”) Patterns for connecting these to model entities create separable “layers” that can be added or modified without disrupting the repository function. MDB as a cross-model tool 23
  • 24. 24
  • 25. • Dynamic • Like data and data models • Pragmatic • Not a repository of ultimate truth • Tool to help us provide value to NCI today • Friendly • Communicates to humans and computers • Simple, but well-defined • Not necessarily exhaustive or “complete” • Distributable • Not necessarily “central” • A platform for “mutual understanding” of data MDB Philosophy: keys to its utility 25 https://cbiit.github.io/bento-meta/mdb-principles.html
  • 26. • Mark Benson, PhD • Phil Musk, PhD • Ming Ying, MS • Anjan Purkayastha, PhD • Ye Wu, PhD • Pat Dunn, PhD • Nelson Moore, MS • John Otridge, PhD Acknowledgements 26