SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
From Advanced Queries to Algorithms to
Advanced ML: 3 Pharmaceutical Graph Use Cases
Dr. Alexander Jarasch
‱ 5 partners + assoc. partners


‱ 450 researchers


‱ bundles basic research and
clinical trials expertise


‱ => variety of data
‹
=> unstructured
‹
=> heterogeneous
‹
=> not connected
‹
=> unFAIR
DZD Data and Knowledge Management team
Dr. Alexander Jarasch
Justus TĂ€ger
Tim Bleimehl
Angela Dedie
Yaroslav Zdravomyslov
The Challenge


Connecting data (silos) -> get new insights
Easy question -> Difficult to answer
The Challenge


Variety of users / diversity of scientific questions
Scientists
Medical‹
Doctors
Data‹
Scientists
Graphdatabase
Biological question:


Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model?


‹
The actual question (from a data-point-of-view):
‹
‹
Is there a connection between A and R?


=> 3s to look into the Excel sheet


Why graph? Easy scientific question
‹
The actual question (from a data-point-of-view):
‹
‹
Is there a connection between A and R?


=> 3s to look into the graph


A
B
C
E
D
F
G
K
Q
R
S
W
Z
U
Why graph? Easy scientific question
Back to the question
Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model?
Genomics
Human diabetic data
Genes
SNPs
Proteins
Enzymes
Pathways
Metabolites
Metabolomics
Pre diabetic pig
Metabolites
List of SNPs
List of Genes of
(species 1)
List of Proteins of
(species 1)
List of loci
List of Enzymes of
(species 1)
List of Pathways of
(species 1)
List of Metabolites
of (species 1)
List of Metabolites
of (species 2)
graph
Why graph? -> why not relational
‱ biomedical data / healthcare data is highly connected


‱ => variety of data
‹
=> unstructured
‹
=> heterogeneous
‹
=> not connected
‹
=> unFAIR


‱ easy to model


‱ extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model


‱ scalable (Billion of nodes+relationships on a single machine


‱ easy to query (cyclic dependencies)


‱ GraphDataScience library + graph embeddings
Alzheimer‘s
cancer
cardio
vascular
diseases
diabetes
Lung


diseases
infectious


diseases
new hypotheses
Diseases are connected
DZDconnect: Concept
DZD in-house data
Natural Language Processing


Inferring knowledge
Knowledge Graph
DZDconnect: stats
‱ PROD-Server: 323m nodes, 1.1bn relationships => 480GB


‱ DEV-Server: 1.1bn nodes, 4.8bn relationships


‱ Singleserver (60 CPUs, 256GB memory, only SSDs)


‱ 4 developers
‹
‱ Neo4j enterprise (live backup, GDS)


‱ UI: flask web server, SemSpect, Neo4j browser


‱ Visualization for interactive browsing (SemSpect by derive GmbH)


‱ Bloom (semi-natural-language queries)
Strata Data
‹
Award finalist 2019
bytes4diabetes Award
2020
Graphie Award 2018
We have
‹
DB role model
DZDconnect:


data integration + ML
Gene RNA Protein
CODES CODES
CODES*
‱ Python


‱ Py2Neo, GraphIO


‱ Docker Pipeline for orchestration (open-source by DZD)


‱ Based on integrated data => annotate / enrich


‱ textmatching + Natural Language Processing


‱ „shortcuts“ for queries (reduce #hops)


‱ inferring knowledge
DZDconnect:


data model <-> human readable = easy to query
DZDconnect:


data model
The Challenge


User with a specific input => specific output
Scientist
multi-omics‹
experiment‹
output
Flask app
The Challenge


User ”start somewhere -> explore freely knowledge”
SemSpect
interactive
browsing
Start from any node
Scientist‹
or‹
Medical‹
Doctor
The Challenge


User with data analysis skills / computer scientist
Scientist
Start from any node
Cypher query language
Graph Data
Science
Use case 1


Handle mapping identifiers of molecular entities
Knowledge Graph
Query „friends of a friend“ on a gene level
‹
Example: diabetes relevant gene ‚TCF7L2’
match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
Use case 2


Find information that is NOW connected
Knowledge Graph
Query for SNPs (mutations) associated to diabetes
‹
Output: relevant protein and its function (ontology terms)
match (tr:Trait)


where tr.name contains ‚diabetes mellitus‘


with tr as disease


match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)-
[:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]->
(prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology)


return path
Use case 3


Using graph algorithms to infer new insights
Natural Language
Processing
‹
Ontologies


Knowledge Graph
Google’s page rank algorithm - find the most relevant gene
‹
finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell
‱ 140’000 abstracts from


Covid19 related publications


‱ NamedEntityRecognition
‹
of gene names


‱ Page Rank identified
‹
‚ACE2‘ as the most relevant
‹
gene
Who’s this ACE2-guy?
source: https://www.benaroyaresearch.org/blog/post/11-things-know-about-mrna-vaccines-covid-19
Use case 4


Using node embeddings to sub phenotype diabetic patients
Natural
DZDconnect


connect raw data of diabetic patients with cancer
Clinical data from 404 diabetic patients
DZDconnect


connect lipidomics fingerprint
Lipidomics
Lipidomics experiment with 116 specific lipids
DZDconnect


connect transcriptomics fingerprint
Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
Transform patients


Fast random projections (fastRP)
CALL gds.fastRP.write
(

'patients'
,

{

embeddingDimension: 50
,

writeProperty: 'fastrp-
embedding'
}

)

YIELD nodePropertiesWritten
Lipido
k-nearest neighbour clustering with k=5


representing the 5 diabetes subtypes
patient 01 patient 02
patient 03
Graph‹
algorithms
patient 04
patient 05
patient 02
p
a
t
i
e
n
t
0
4
patient 03
patient 05
patient 01
subphenotyping of diabetic patients
DZDconnect


connect patient data with knowledge graph
Transcript
Gene
Synonyms
Abstract
PubMed
‹
Article
Keyword
‹
MeSH-term


Ontology term
Hello role-model :-)
Take home message
‱ Knowledge graph


‱ as single point of truth


‱ connect in-house data


‱ scalability


‱ infer new insights
‹
‱ Use cases:


‱ simple and advanced (Cypher) queries


‱ Graph Data Science library (page rank, kNN)


‱ Node embeddings for complex data


‱ NLP
‱ Visualization of graph


‱ different users


‱ flask app, browser, SemSpect,

Thanks to

Mais conteĂșdo relacionado

Mais procurados

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 

Mais procurados (20)

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data Engineering
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data Pipeline
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 

Semelhante a From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
c.titus.brown
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Final-Presentation
Final-PresentationFinal-Presentation
Final-Presentation
Revanth Malay
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
Michael Atkins
 

Semelhante a From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases (20)

From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNG
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
 
Neo4j for Healthcare & Life Sciences
Neo4j for Healthcare & Life SciencesNeo4j for Healthcare & Life Sciences
Neo4j for Healthcare & Life Sciences
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Final-Presentation
Final-PresentationFinal-Presentation
Final-Presentation
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
Rescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsRescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information Systems
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
 
The Power of Graphs to Analyze Biological Data
The Power of Graphs to Analyze Biological DataThe Power of Graphs to Analyze Biological Data
The Power of Graphs to Analyze Biological Data
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 

Mais de Neo4j

Mais de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Bert Jan Schrijver
 

Último (20)

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

  • 1. From Advanced Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases Dr. Alexander Jarasch
  • 2. ‱ 5 partners + assoc. partners ‱ 450 researchers ‱ bundles basic research and clinical trials expertise ‱ => variety of data ‹ => unstructured ‹ => heterogeneous ‹ => not connected ‹ => unFAIR
  • 3. DZD Data and Knowledge Management team Dr. Alexander Jarasch Justus TĂ€ger Tim Bleimehl Angela Dedie Yaroslav Zdravomyslov
  • 4. The Challenge Connecting data (silos) -> get new insights Easy question -> Difficult to answer
  • 5. The Challenge Variety of users / diversity of scientific questions Scientists Medical‹ Doctors Data‹ Scientists Graphdatabase
  • 6. Biological question: Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? ‹ The actual question (from a data-point-of-view): ‹ ‹ Is there a connection between A and R? => 3s to look into the Excel sheet Why graph? Easy scientific question
  • 7. ‹ The actual question (from a data-point-of-view): ‹ ‹ Is there a connection between A and R? => 3s to look into the graph A B C E D F G K Q R S W Z U Why graph? Easy scientific question
  • 8. Back to the question Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? Genomics Human diabetic data Genes SNPs Proteins Enzymes Pathways Metabolites Metabolomics Pre diabetic pig Metabolites List of SNPs List of Genes of (species 1) List of Proteins of (species 1) List of loci List of Enzymes of (species 1) List of Pathways of (species 1) List of Metabolites of (species 1) List of Metabolites of (species 2) graph
  • 9. Why graph? -> why not relational ‱ biomedical data / healthcare data is highly connected ‱ => variety of data ‹ => unstructured ‹ => heterogeneous ‹ => not connected ‹ => unFAIR ‱ easy to model ‱ extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model ‱ scalable (Billion of nodes+relationships on a single machine ‱ easy to query (cyclic dependencies) ‱ GraphDataScience library + graph embeddings
  • 11. DZDconnect: Concept DZD in-house data Natural Language Processing Inferring knowledge Knowledge Graph
  • 12. DZDconnect: stats ‱ PROD-Server: 323m nodes, 1.1bn relationships => 480GB ‱ DEV-Server: 1.1bn nodes, 4.8bn relationships ‱ Singleserver (60 CPUs, 256GB memory, only SSDs) ‱ 4 developers ‹ ‱ Neo4j enterprise (live backup, GDS) ‱ UI: flask web server, SemSpect, Neo4j browser ‱ Visualization for interactive browsing (SemSpect by derive GmbH) ‱ Bloom (semi-natural-language queries) Strata Data ‹ Award finalist 2019 bytes4diabetes Award 2020 Graphie Award 2018 We have ‹ DB role model
  • 13. DZDconnect: data integration + ML Gene RNA Protein CODES CODES CODES* ‱ Python ‱ Py2Neo, GraphIO ‱ Docker Pipeline for orchestration (open-source by DZD) ‱ Based on integrated data => annotate / enrich ‱ textmatching + Natural Language Processing ‱ „shortcuts“ for queries (reduce #hops) ‱ inferring knowledge
  • 14. DZDconnect: data model <-> human readable = easy to query
  • 16. The Challenge User with a specific input => specific output Scientist multi-omics‹ experiment‹ output Flask app
  • 17. The Challenge User ”start somewhere -> explore freely knowledge” SemSpect interactive browsing Start from any node Scientist‹ or‹ Medical‹ Doctor
  • 18. The Challenge User with data analysis skills / computer scientist Scientist Start from any node Cypher query language Graph Data Science
  • 19. Use case 1 Handle mapping identifiers of molecular entities Knowledge Graph
  • 20. Query „friends of a friend“ on a gene level ‹ Example: diabetes relevant gene ‚TCF7L2’ match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
  • 21. Use case 2 Find information that is NOW connected Knowledge Graph
  • 22. Query for SNPs (mutations) associated to diabetes ‹ Output: relevant protein and its function (ontology terms) match (tr:Trait) where tr.name contains ‚diabetes mellitus‘ with tr as disease match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)- [:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]-> (prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology) return path
  • 23. Use case 3 Using graph algorithms to infer new insights Natural Language Processing ‹ Ontologies Knowledge Graph
  • 24. Google’s page rank algorithm - find the most relevant gene ‹ finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell ‱ 140’000 abstracts from Covid19 related publications ‱ NamedEntityRecognition ‹ of gene names ‱ Page Rank identified ‹ ‚ACE2‘ as the most relevant ‹ gene
  • 25. Who’s this ACE2-guy? source: https://www.benaroyaresearch.org/blog/post/11-things-know-about-mrna-vaccines-covid-19
  • 26. Use case 4 Using node embeddings to sub phenotype diabetic patients Natural
  • 27. DZDconnect connect raw data of diabetic patients with cancer Clinical data from 404 diabetic patients
  • 29. DZDconnect connect transcriptomics fingerprint Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
  • 30. Transform patients Fast random projections (fastRP) CALL gds.fastRP.write ( 'patients' , { embeddingDimension: 50 , writeProperty: 'fastrp- embedding' } ) YIELD nodePropertiesWritten Lipido
  • 31. k-nearest neighbour clustering with k=5 representing the 5 diabetes subtypes patient 01 patient 02 patient 03 Graph‹ algorithms patient 04 patient 05 patient 02 p a t i e n t 0 4 patient 03 patient 05 patient 01 subphenotyping of diabetic patients
  • 32. DZDconnect connect patient data with knowledge graph Transcript Gene Synonyms Abstract PubMed ‹ Article Keyword ‹ MeSH-term Ontology term Hello role-model :-)
  • 33. Take home message ‱ Knowledge graph ‱ as single point of truth ‱ connect in-house data ‱ scalability ‱ infer new insights ‹ ‱ Use cases: ‱ simple and advanced (Cypher) queries ‱ Graph Data Science library (page rank, kNN) ‱ Node embeddings for complex data ‱ NLP ‱ Visualization of graph ‱ different users ‱ flask app, browser, SemSpect,