SlideShare uma empresa Scribd logo
1 de 21
Nolan Nichols, PhD
Maze Therapeutics
October 15, 2020
Focus on the evidence: a
knowledge graph approach to
profiling drug targets
D4 GLOBAL
why do some people get sick and
others don’t, even when they have
the same disease-causing gene?
2
3
genetic modifiers are naturally occurring and can be identified
in 2016, the Resilience Project published that
they had identified individuals who should have
serious childhood diseases, but didn’t,
describing potential genetic modifiers
Chen et al. Nat Biotechnology 2016
4
Dr. Jonathan Weissman and
team observed that some
gene-gene interactions have a
‘buffering’ or protective effect
on disease-causing mutations
Horlbeck et al. Cell 2018
CRISPRi technology developed by the Weissman lab at UCSF enabled mapping of
genetic interactions at scale
based on genetic insights, genetic modifier targets can be developed into
transformative therapies for patients
5
protective modifiers can…
be discovered from, or
validated by, functional
genomics data
be targeted to
develop new
therapeutics
be identified from
human genetic data that
naturally protect some
people from disease
disease-causing gene genetic modifier therapy
SMN1 mutations
leads to SMA
treat by increasing SMN2 copy number
to mimic genetic modifier
maze has identified many diseases for which its
platform can transform genetic modifier insights to
novel therapies
SMN2 overproduction can compensate
for SMN1 in SMA patients
an example of a known genetic modifier inspiring a novel treatment for
spinal muscular atrophy (SMA)
6
our purposely built approach: maze is translating genetic modifying insights into
transformative therapies for patients
Our current research areas:
• Mendelian diseases
• Genetic modifiers
Potential future research areas:
• Polygenic diseases
• Haploinsufficiency
advanced data science for analysis of large, integrated data
proprietary cohort
data for maze
pay for access
access public data
genome-
wide
CRISPR
screens
single-cell
biology
cellular
disease
modeling
inter-
actomics
mutational
scanning
future
innovation
access and analyze meaningful
human genetics data
elucidate target biology leveraging
functional genomics
efficiently prosecute drug
discovery with multiple modalities
7
maze is generating proprietary data on genetic modifiers discovered
from integrated human genetic and functional genomic data
access and analyze meaningful
human genetics data
elucidate target biology leveraging
functional genomics
integrated human genetic and functional genomic data lowers barriers to analysis and
answering questions
proprietary cohort
data for maze
pay for access
access public data
genome-
wide
CRISPR
screens
single-cell
biology
cellular
disease
modeling
inter-
actomics
mutational
scanning
future
innovation
8
advanced data science for analysis of large, integrated data
https://www.anaconda.com/state-of-data-science-2020
a 2020 survey of 2,360 data professionals from 100 countries
indicates that “For most respondents, data management tasks
still consume a disproportionate amount of work time.”
n=1099
9
collaboration with AWS healthcare and life sciences supports a cloud-based data
architecture
visualizationcomputation
graph
database
publication
open data
bioinformatician biologist chemist
cloud compute layer (aws biotech blueprint)
data persistence layer
data management layer
data access layer
object
store
relational
database
(meta)data servicesgovernance
https://aws.amazon.com/quickstart/biotech-blueprint/ FAIR Principles: https://doi.org/10.1038/sdata.2016.18
10
there are many technologies that can be used to
construct a knowledge graph, the Resource
Description Framework (RDF) matches the FAIR
principles’ focus on identifiers and controlled terms
knowledge graph technologies support use cases for standardized datasets that are
designed to be connected
so:Genotype efo:Disease
kg:SMN1 kg:SMA
rdf:type rdf:type
ro:causes
condition
kg:SMN1 ro:causes_condition kg:SMA .
kg:SMA rdf:type efo:Disease .
kg:SMN1 rdf:type so:Genotype .
Prefixes
rdf: RDF specification
ro: Relations Ontology
so: Sequence Ontology
efo: Experimental Factor Ontology
kg: example “knowledge graph” namespace
11
applications of semantic technologies: a bioinformatician, biologist, and chemist walk
into a bar
role user story
bioinformatician “I completed an analysis that includes a report with
my interpretations and tables of statistical model
output, and I want to publish these artifacts to our
data portal where my collaborators can examine
with self-service analytical tools.”
biologist “I am evaluating targets that were identified in a
bioinformatics analysis by reviewing different
sources of evidence, and I need to track the
information I am gathering and present a report to
my team.”
chemist “I received a prioritized list of potential targets for a
given disease from the target discovery team, and I
want to gather information about all compounds
that are known interactors with these targets.”
drug discovery
target discovery
target validation
12
bioinformatics results are used to drive decision making and are managed as key
corporate assets
which genes are
differentially expressed
in this experiment?
collaborators
email data portal
• bioinformatics reports and datasets are treated as
peer-reviewed publications in a centralized data portal
• metadata about results are formal dataset descriptions
with a semantic model and controlled terminology
• analytics applications use microservices to drive data
visualizations and navigate connected datasets
challenge: many “artisanal” analyses are lost
in email, file servers, or messaging services
13
semantic technology components supporting publication of bioinformatics results
• ontology terms define result
types and relationships
• provide canonical labels and
definitions
• designed using the protégé
editor and versioned in git
• analysts initialize a
templated project directory
and environment
• a dataset description is
generated using ontology-
driven tooling
• a validated dataset
description is published to
a central data portal
• metadata is added to a
search index
• tabular files accessed via a
data service api
target
constraint
violation
dataset description
• dataset descriptions are
modeled as a data graph
• the shape constraint
language is used to
validate the graph
14
applications of semantic technologies: a bioinformatician, biologist, and chemist walk
into a bar
role user story
bioinformatician “I completed an analysis that includes a report with
my interpretations and tables of statistical model
output, and I want to publish these artifacts to our
data portal where my collaborators can examine
with self-service analytical tools.”
biologist “I am evaluating targets that were identified in a
bioinformatics analysis by reviewing different
sources of evidence, and I need to track the
information I am gathering and present a report to
my team.”
chemist “I received a prioritized list of potential targets for a
given disease from the target discovery team, and I
want to gather information about all compounds
that are known interactors with these targets.”
drug discovery
target discovery
target validation
15
expert target evaluations are captured as data using structured evidence annotations
challenge: knowledge gained from literature
and database reviews are hidden in slide decks
does the evidence
support a therapeutic
hypothesis for my gene?
collaborators
slide deck web app
• electronic data capture app used to guide users
through a target evaluation protocol
• figures and visualizations embedded in a web app with
provenance information and evidence ontology codes
• structured annotations used to generate slide decks
and connect to related data using gene identifiers
16
semantic technology components support structured annotation of target evaluations
• analytics app enables ranking genes
and drill down via detailed views
• organized to guide target evaluation
process w/access to evidence
sources
• free-text review, image,
rankings, and source url
for provenance
• semantic evidence codes
are used annotate each
review item
• structured target profiles enable
multiple representations
• target profile slide decks are auto-
populated with evidence reviews
• evaluating knowledge graph
models using nanopublications
and biolink
• data portal services
provide access to
results in apps
17
applications of semantic technologies: a bioinformatician, biologist, and chemist walk
into a bar
role user story
bioinformatician “I completed an analysis that includes a report with
my interpretations and tables of statistical model
output, and I want to publish these artifacts to our
data portal where my collaborators can examine
with self-service analytical tools.”
biologist “I am evaluating targets that were identified in a
bioinformatics analysis by reviewing different
sources of evidence, and I need to track the
information I am gathering and present a report to
my team.”
chemist “I received a prioritized list of potential targets for a
given disease from the target discovery team, and I
want to gather information about all compounds
that are known interactors with these targets.”
drug discovery
target discovery
target validation
1818
proprietary and shared data are integrated by incrementally expanding the knowledge
graph’s scope
challenge: heterogeneous organization of datasets
are prohibitively time consuming to integrate
What compounds interact
with this target and what
are their properties?
relational
database
graph
database
• significant results from internal analysis and target
reviews include cross references to external datasets
• publicly available gene models and chemical
compounds staged on maze data infrastructure
• solution enables integrated queries over proprietary
and shared data for quickly answering questions
collaborators
semantic technology components support integrated queries over proprietary and
shared data graphs
chembl rdfensembl rdf
• ensembl rdf represents
genomic features,
genomic locations and
cross-references
including to chembl
differential expression rdf
• differential expression
results are
transformed to rdf
using r2rml and linked
using gene identifiers
target review rdf
19
• target reviews are
linked via gene
identifiers to enable
integrated queries with
chembl and ensembl
• chembl rdf explicitly links
chemical, bioactivity, and
genomic data with cross-
references to other
databases
20
summary and conclusion
launched in 2019 with
$190m+ investment
based in south san francisco
with ~80 employees
founded on concept of
genetic modifiers
investors
21
translating genetic
modifying insights into
new therapeutics

Mais conteúdo relacionado

Mais procurados

eMagzine Spring 2016 (Trusha Hake)
eMagzine  Spring 2016 (Trusha Hake)eMagzine  Spring 2016 (Trusha Hake)
eMagzine Spring 2016 (Trusha Hake)
Trusha Hake
 
Classification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey DataClassification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey Data
CSCJournals
 
Apps for science- Applications on ScienceDirect - protein finder and interact...
Apps for science- Applications on ScienceDirect - protein finder and interact...Apps for science- Applications on ScienceDirect - protein finder and interact...
Apps for science- Applications on ScienceDirect - protein finder and interact...
remko caprio
 
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014cVojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech Huser
 

Mais procurados (20)

Pistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseasesPistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseases
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical research
 
Connecting eh rdataquad12
Connecting eh rdataquad12Connecting eh rdataquad12
Connecting eh rdataquad12
 
eMagzine Spring 2016 (Trusha Hake)
eMagzine  Spring 2016 (Trusha Hake)eMagzine  Spring 2016 (Trusha Hake)
eMagzine Spring 2016 (Trusha Hake)
 
research participation as a social contract
research participation as a social contractresearch participation as a social contract
research participation as a social contract
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discovery
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
20160811 Big Data for Health and Medicine
20160811 Big Data for Health and Medicine20160811 Big Data for Health and Medicine
20160811 Big Data for Health and Medicine
 
Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009Grid And Healthcare For IOM July 2009
Grid And Healthcare For IOM July 2009
 
Classification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey DataClassification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey Data
 
archenaa2015-survey-big-data-government.pdf
archenaa2015-survey-big-data-government.pdfarchenaa2015-survey-big-data-government.pdf
archenaa2015-survey-big-data-government.pdf
 
AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug development
 
Biosurveillance 2.0
Biosurveillance 2.0Biosurveillance 2.0
Biosurveillance 2.0
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
Apps for science- Applications on ScienceDirect - protein finder and interact...
Apps for science- Applications on ScienceDirect - protein finder and interact...Apps for science- Applications on ScienceDirect - protein finder and interact...
Apps for science- Applications on ScienceDirect - protein finder and interact...
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014cVojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
 
the beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealththe beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealth
 
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"
 

Semelhante a Focus on the Evidence: a knowledge graph approach to profiling drug targets

Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Amit Sheth
 

Semelhante a Focus on the Evidence: a knowledge graph approach to profiling drug targets (20)

Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019
 
Thesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research dataThesis defense, Heather Piwowar, Sharing biomedical research data
Thesis defense, Heather Piwowar, Sharing biomedical research data
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Openehr clinical modelling
Openehr clinical modellingOpenehr clinical modelling
Openehr clinical modelling
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024
 
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERYA WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
 
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERYA WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Pine Biotech
Pine BiotechPine Biotech
Pine Biotech
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Seattle-Denver VA Center for Innovation
Seattle-Denver VA Center for InnovationSeattle-Denver VA Center for Innovation
Seattle-Denver VA Center for Innovation
 
Data Visuallization for Decision Making - Intel White Paper
Data Visuallization for Decision Making - Intel White PaperData Visuallization for Decision Making - Intel White Paper
Data Visuallization for Decision Making - Intel White Paper
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
 

Mais de Nolan Nichols

Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Nolan Nichols
 

Mais de Nolan Nichols (6)

Maze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and developmentMaze's Compass Platform - A data fabric for drug discovery and development
Maze's Compass Platform - A data fabric for drug discovery and development
 
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptxAWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...
Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...
Implementing Semantics-Driven Data Exchange in Brain Science: the NCANDA Case...
 
The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...
The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...
The National Consortium on Alcohol and Neurodevelopment in Adolescence (NCAND...
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
 

Último

Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 

Focus on the Evidence: a knowledge graph approach to profiling drug targets

  • 1. Nolan Nichols, PhD Maze Therapeutics October 15, 2020 Focus on the evidence: a knowledge graph approach to profiling drug targets D4 GLOBAL
  • 2. why do some people get sick and others don’t, even when they have the same disease-causing gene? 2
  • 3. 3 genetic modifiers are naturally occurring and can be identified in 2016, the Resilience Project published that they had identified individuals who should have serious childhood diseases, but didn’t, describing potential genetic modifiers Chen et al. Nat Biotechnology 2016
  • 4. 4 Dr. Jonathan Weissman and team observed that some gene-gene interactions have a ‘buffering’ or protective effect on disease-causing mutations Horlbeck et al. Cell 2018 CRISPRi technology developed by the Weissman lab at UCSF enabled mapping of genetic interactions at scale
  • 5. based on genetic insights, genetic modifier targets can be developed into transformative therapies for patients 5 protective modifiers can… be discovered from, or validated by, functional genomics data be targeted to develop new therapeutics be identified from human genetic data that naturally protect some people from disease
  • 6. disease-causing gene genetic modifier therapy SMN1 mutations leads to SMA treat by increasing SMN2 copy number to mimic genetic modifier maze has identified many diseases for which its platform can transform genetic modifier insights to novel therapies SMN2 overproduction can compensate for SMN1 in SMA patients an example of a known genetic modifier inspiring a novel treatment for spinal muscular atrophy (SMA) 6
  • 7. our purposely built approach: maze is translating genetic modifying insights into transformative therapies for patients Our current research areas: • Mendelian diseases • Genetic modifiers Potential future research areas: • Polygenic diseases • Haploinsufficiency advanced data science for analysis of large, integrated data proprietary cohort data for maze pay for access access public data genome- wide CRISPR screens single-cell biology cellular disease modeling inter- actomics mutational scanning future innovation access and analyze meaningful human genetics data elucidate target biology leveraging functional genomics efficiently prosecute drug discovery with multiple modalities 7 maze is generating proprietary data on genetic modifiers discovered from integrated human genetic and functional genomic data
  • 8. access and analyze meaningful human genetics data elucidate target biology leveraging functional genomics integrated human genetic and functional genomic data lowers barriers to analysis and answering questions proprietary cohort data for maze pay for access access public data genome- wide CRISPR screens single-cell biology cellular disease modeling inter- actomics mutational scanning future innovation 8 advanced data science for analysis of large, integrated data https://www.anaconda.com/state-of-data-science-2020 a 2020 survey of 2,360 data professionals from 100 countries indicates that “For most respondents, data management tasks still consume a disproportionate amount of work time.” n=1099
  • 9. 9 collaboration with AWS healthcare and life sciences supports a cloud-based data architecture visualizationcomputation graph database publication open data bioinformatician biologist chemist cloud compute layer (aws biotech blueprint) data persistence layer data management layer data access layer object store relational database (meta)data servicesgovernance https://aws.amazon.com/quickstart/biotech-blueprint/ FAIR Principles: https://doi.org/10.1038/sdata.2016.18
  • 10. 10 there are many technologies that can be used to construct a knowledge graph, the Resource Description Framework (RDF) matches the FAIR principles’ focus on identifiers and controlled terms knowledge graph technologies support use cases for standardized datasets that are designed to be connected so:Genotype efo:Disease kg:SMN1 kg:SMA rdf:type rdf:type ro:causes condition kg:SMN1 ro:causes_condition kg:SMA . kg:SMA rdf:type efo:Disease . kg:SMN1 rdf:type so:Genotype . Prefixes rdf: RDF specification ro: Relations Ontology so: Sequence Ontology efo: Experimental Factor Ontology kg: example “knowledge graph” namespace
  • 11. 11 applications of semantic technologies: a bioinformatician, biologist, and chemist walk into a bar role user story bioinformatician “I completed an analysis that includes a report with my interpretations and tables of statistical model output, and I want to publish these artifacts to our data portal where my collaborators can examine with self-service analytical tools.” biologist “I am evaluating targets that were identified in a bioinformatics analysis by reviewing different sources of evidence, and I need to track the information I am gathering and present a report to my team.” chemist “I received a prioritized list of potential targets for a given disease from the target discovery team, and I want to gather information about all compounds that are known interactors with these targets.” drug discovery target discovery target validation
  • 12. 12 bioinformatics results are used to drive decision making and are managed as key corporate assets which genes are differentially expressed in this experiment? collaborators email data portal • bioinformatics reports and datasets are treated as peer-reviewed publications in a centralized data portal • metadata about results are formal dataset descriptions with a semantic model and controlled terminology • analytics applications use microservices to drive data visualizations and navigate connected datasets challenge: many “artisanal” analyses are lost in email, file servers, or messaging services
  • 13. 13 semantic technology components supporting publication of bioinformatics results • ontology terms define result types and relationships • provide canonical labels and definitions • designed using the protégé editor and versioned in git • analysts initialize a templated project directory and environment • a dataset description is generated using ontology- driven tooling • a validated dataset description is published to a central data portal • metadata is added to a search index • tabular files accessed via a data service api target constraint violation dataset description • dataset descriptions are modeled as a data graph • the shape constraint language is used to validate the graph
  • 14. 14 applications of semantic technologies: a bioinformatician, biologist, and chemist walk into a bar role user story bioinformatician “I completed an analysis that includes a report with my interpretations and tables of statistical model output, and I want to publish these artifacts to our data portal where my collaborators can examine with self-service analytical tools.” biologist “I am evaluating targets that were identified in a bioinformatics analysis by reviewing different sources of evidence, and I need to track the information I am gathering and present a report to my team.” chemist “I received a prioritized list of potential targets for a given disease from the target discovery team, and I want to gather information about all compounds that are known interactors with these targets.” drug discovery target discovery target validation
  • 15. 15 expert target evaluations are captured as data using structured evidence annotations challenge: knowledge gained from literature and database reviews are hidden in slide decks does the evidence support a therapeutic hypothesis for my gene? collaborators slide deck web app • electronic data capture app used to guide users through a target evaluation protocol • figures and visualizations embedded in a web app with provenance information and evidence ontology codes • structured annotations used to generate slide decks and connect to related data using gene identifiers
  • 16. 16 semantic technology components support structured annotation of target evaluations • analytics app enables ranking genes and drill down via detailed views • organized to guide target evaluation process w/access to evidence sources • free-text review, image, rankings, and source url for provenance • semantic evidence codes are used annotate each review item • structured target profiles enable multiple representations • target profile slide decks are auto- populated with evidence reviews • evaluating knowledge graph models using nanopublications and biolink • data portal services provide access to results in apps
  • 17. 17 applications of semantic technologies: a bioinformatician, biologist, and chemist walk into a bar role user story bioinformatician “I completed an analysis that includes a report with my interpretations and tables of statistical model output, and I want to publish these artifacts to our data portal where my collaborators can examine with self-service analytical tools.” biologist “I am evaluating targets that were identified in a bioinformatics analysis by reviewing different sources of evidence, and I need to track the information I am gathering and present a report to my team.” chemist “I received a prioritized list of potential targets for a given disease from the target discovery team, and I want to gather information about all compounds that are known interactors with these targets.” drug discovery target discovery target validation
  • 18. 1818 proprietary and shared data are integrated by incrementally expanding the knowledge graph’s scope challenge: heterogeneous organization of datasets are prohibitively time consuming to integrate What compounds interact with this target and what are their properties? relational database graph database • significant results from internal analysis and target reviews include cross references to external datasets • publicly available gene models and chemical compounds staged on maze data infrastructure • solution enables integrated queries over proprietary and shared data for quickly answering questions collaborators
  • 19. semantic technology components support integrated queries over proprietary and shared data graphs chembl rdfensembl rdf • ensembl rdf represents genomic features, genomic locations and cross-references including to chembl differential expression rdf • differential expression results are transformed to rdf using r2rml and linked using gene identifiers target review rdf 19 • target reviews are linked via gene identifiers to enable integrated queries with chembl and ensembl • chembl rdf explicitly links chemical, bioactivity, and genomic data with cross- references to other databases
  • 21. launched in 2019 with $190m+ investment based in south san francisco with ~80 employees founded on concept of genetic modifiers investors 21 translating genetic modifying insights into new therapeutics

Notas do Editor

  1. Today I’ll be talking about how we are using knowledge graph technologies to profile drug targets in our discovery platform. But since many of you are hearing about Maze for the first time, I’ll start with a brief overview of our company before diving into the technical part of the talk.
  2. Maze was founded to answer this fundamental question… Why do some people get sick and others don’t, even when they have the same disease-causing gene?
  3. Back in 2016 when our founders were first developing the concept of maze, a paper was published by the Resilience Project in which they identified individuals who should have a serious childhood disease, but did not. Our founders asked the question… why? So, it’s broadly known that there are genes that can protect people from certain diseases. But they were curious if this type of insight into so called genetic modifiers could be used as a general platform for identifying therapeutic targets.
  4. Around this same time, one of the maze founders, Jonathan Weissman, and his lab were developing novel applications of CRISPR to do genome wide gene / gene interaction studies another maze founder Steve Elledge, who won the US equivalent of Nobel Prize for his work on DNA damage repair, was also interested in applying advanced functional genomics tools to study these gene / gene interactions Both Jonathan and Steve were looking at how these tools could be used to kill cancer cells HOWEVER, the interesting thing is that while Jonathan was looking for synthetic lethal combinations, he also found protective combinations that his lab described as having “buffering effects”. Further building on the idea in the Resilience project that one could identify protective modifiers What the maze team wanted to see if they could use Jonathan and Steve’s concepts to build a drug company
  5. So as the early maze team started to build the company, the thought was that we could identify naturally protective variants from human genetic data then we could generate proprietary functional genomics data to validate these protective modifiers and develop new therapies for severe genetically defined diseases But what gave us the confidence to believe that this was a viable drugging strategy?
  6. Well… It turns out that there was a drug approved in 2016 based on this exact idea for Spinal Muscular Atrophy, which is a horrible neuromuscular disease The treatment was designed to increase SMN2 copy number which was found to help patients with SMN1 mutations that cause thus disease So with this example in mind, our goal was to build a platform that could systematically identify and drug genetic modifiers for severe genetic diseases
  7. Our approach was to develop a purpose build platform that integrates high-value human genetic and functional genomic data from public, commercial, and proprietary sources Then conduct genome-wide crispr screens that can be used to understand the biology related to genetic modifiers Once we’ve amassed a critical body of evidence, we can use what we’ve learned to focus our drug discovery efforts in a data-driven way Now that you have some general context about maze, I’d like to switch gears and start unpack the importance of having integrated human genetic and functional genomic data
  8. One of our goals in the data science group is to provide a 360° view of any evidence that can used associate a disease to potential targets Over the past few years its been widely cited that disproportionate amount of a data scientists time goes to data preparation rather than doing analysis itself And survey from the Anaconda 2020 State of Data Science report still indicates that data professions spend around 45% of their time on data management tasks These data management tasks are essential and provide a foundation for the overall data science lifecycle, from preparation to analysis, visualization, and reporting. With a foundation of integrated data, it enables teams to focus on work that more fully leverages their unique skillsets
  9. We've been developing a data platform to provide this foundation of integrated data, which is summarize in this four-layer diagram The cloud compute layer was designed in collaboration with the AWS Healthcare and Life Sciences group and is based on their Biotech Blueprint best practices architecture The data persistence layer that is source of truth for archived data and metadata. This includes a suite of backend services as well as virtual sources hosted AWS that are registered into our data lake. The data management layer follows a governance model based on the FAIR principles, ensuring that data and metadata are findable, accessible, interoperable, and reusable via web services All of which are available to users via a data access layer provides web apps, command line utilities, and programmatic interfaces in R and python. Today we'll be looking at how this platform is used to support three groups of users, specifically how it can be used to produce integrated data that follows the FAIR principles by using knowledge graph technologies.
  10. First, there are several technologies that can be used to implemet knowledge graphs, but for our purposes we are using the Resource Description Framework and corresponding semantic technology stack. Much of this decision comes from the fact that the semantic technology stack is based on mature standards that provides greater vender neutrality, meaning that the data model, query language, validation, and inference rules can plug and play in different dbs I'm not going to go in depth regarding the technical detauls, but to give you an intuitive sense for what I'm talking about we can look at this simple example where we have a single statement, or triple, that asserts the SMN1 causes the condition SMA, spinal muscular atrophy RDF then lets you expand on this with a type system where you can state that SMN1 is of type Geneotype and SMA is of type Disease. Under the hood, this is written out in a simple text document with three columns, were these prefixes are used as shorthand for references to specific ontology terms or other identifiers. There is much more to discuss here, but I want to move on to discussing how this can be applied in a few scenarios.
  11. The three examples are based on real tasks that are completed by bioinformaticians in my data science group, biologists in our functional genomics group, and a computational chemist on our drug discovery team. I chose a set of examples that walk us through early stage work from target discovery, to validation, and drug discovery, with an eye toward how these data can be linked together I’ll briefly highlight each of these user stories before looking at each in more depth. …
  12. In our bioinformatics user story, a typical analyst’s workflow will primarily work with the Bioconductor open source tools in Rstudio to identify, for example, differentially expressed genes in a crispr screen As part of this workflow, analyst use a rich data structure called a SummarizedExperiment where phenotypic, feature/gene information, count matrices, and experiment metadata are captured in a single data structure With all content required for analysis in this SummarizedExperiement Object, analysts will generate a report with their interpretations that will be communicated to collaborators. Traditionally, one of the core challenges here is that these “artisanal” analyses may not be formally tracked and captured as part of the corporate memory and rather lost to suboptimal communication channels At maze, we’ve taken the approach to treating such analyses as key corporate assets that are described in a standardized way, published to a data portal, and accessible to downstream applications.
  13. To implement this framework, we're developing an ontology using the protege editor that provides the controlled terms for our dataset descriptions With this, analysts can create a standardized analysis environment and generate a dataset description of their results using ontology driven tools that produces a document using an RDF format called JSON-LD These descriptions can then be validated using another part of the semantic technology stack called shacl. This can be used to generate a report of any violations, for example if terms are used to annotate results that are not part of the ontology. Once a description is validated, it can be published alongside data files into our central data portal where metadata is added to a search index and a data api provides access to statistical results
  14. Now lets take a look at the biologists user story where gene lists produced in the previous are researched in greater detail.
  15. While not in the lab due to covid, I worked closely with bench scientists followup on a gene list oh hits from a crispr screen Our goal was to survey the literature and online databases to gather evidence around a set of genes trying to understand their potential role in a given disease. The challenge is that much of the time the information gathered hidden away in a slide deck. Rather than create a different slide deck of each gene, we developed an app to create a database of everything we were learning by following a target evaluation protocol This information fed into the design of the experiments that are now being run in the lab and the evidence gathered is now part of our growing knowledge graph
  16. In terms of implementation, the data portal provided access to the down stream analytics app that was used for target reviews The analytics app provides tools for ranking genes based on different attributes and then examining detailed views that guide user through a target evaluation protocol The target review app provides form based data entry that captures images, free text, and annotation with evidence codes from an ontology, for example, tagging a review as protein level expression. The structured reviews collected can then be used to generate different views of the data collected. Initially these were templated powerpoint slides, but we are evaluationing other open models, such as nanotations and biolink, to organizing content like these reviews into a knowledge graph, These in turn can be added to the data portal and fed back into analytics apps, creating a virtuous feedback loop.
  17. Finally, well look at the user story of a chemist that is interested in cross referencing internal results with public compound databases.
  18. One of most challenging part of these efforts is the amount of time it takes to align the schemas between datasets that were designed with a specific application in mind. To lower the barriers to reuse and ad-hoc integration requests, we include cross references to external datasets in our analysis results and target review We've also brought publicly available gene models and chemical compound into the maze data infrastructure and worked toward a solution that enables integrated queries for quickly answering questions that include our internal data
  19. First, taking differential expression as an example, we use relational to rdf mapping language (r2rml) and related technologies to transform internal data into rdf Similarly, we are leveraging a graph based representation of the aforementioned target reviews the include the same gene identifiers As a proof of concept, we use the EBI distribution of both ensembl and chembl in RDF, which already provide mappings from genes, to transcripts, to proteines, to chembl targets. By leveraging the built in properties of RDF, we were able to incrementally expand our knowledge graph with new facts and sources to cross reference our internal data with public sources
  20. We’ve discussed three examples of where semantic technologies can be used to both capture and use FAIR data, but if we step back we can see a bigger picture that can emerge when following this general pattern to data management. While structuring information this way enables a traditional pipeline for a single drug campaign, it can also enables synergies across different programs in a way that isn't usually available. Many of our experimental insights are about learning generalizable techniques, reagent properties, dosing conditions, etc, that can be valuable to other programs. When you network information across experiments and system like this, you are also implicitly gathering practical insights that you can immediately use in other contexts. By identifying use cases and incrementally building the maze knowledge graph over time, my hope is that this network of data spanning from target discovery, validation, and drug discovery will help us identify the right patients for the therapies we are developing.
  21. This is maze. We launched the company in 2019, and are pursuing a novel approach to drug development. Raised $191M from group of experienced investors. Strong team, around 75 employees.