Focus on the Evidence: a knowledge graph approach to profiling drug targets

Nolan Nichols, PhD
Maze Therapeutics
October 15, 2020
Focus on the evidence: a
knowledge graph approach to
profiling drug targets
D4 GLOBAL

why do some people get sick and
others don’t, even when they have
the same disease-causing gene?
2

3
genetic modifiers are naturally occurring and can be identified
in 2016, the Resilience Project published that
they had identified individuals who should have
serious childhood diseases, but didn’t,
describing potential genetic modifiers
Chen et al. Nat Biotechnology 2016

4
Dr. Jonathan Weissman and
team observed that some
gene-gene interactions have a
‘buffering’ or protective effect
on disease-causing mutations
Horlbeck et al. Cell 2018
CRISPRi technology developed by the Weissman lab at UCSF enabled mapping of
genetic interactions at scale

based on genetic insights, genetic modifier targets can be developed into
transformative therapies for patients
5
protective modifiers can…
be discovered from, or
validated by, functional
genomics data
be targeted to
develop new
therapeutics
be identified from
human genetic data that
naturally protect some
people from disease

disease-causing gene genetic modifier therapy
SMN1 mutations
leads to SMA
treat by increasing SMN2 copy number
to mimic genetic modifier
maze has identified many diseases for which its
platform can transform genetic modifier insights to
novel therapies
SMN2 overproduction can compensate
for SMN1 in SMA patients
an example of a known genetic modifier inspiring a novel treatment for
spinal muscular atrophy (SMA)
6

our purposely built approach: maze is translating genetic modifying insights into
transformative therapies for patients
Our current research areas:
• Mendelian diseases
• Genetic modifiers
Potential future research areas:
• Polygenic diseases
• Haploinsufficiency
advanced data science for analysis of large, integrated data
proprietary cohort
data for maze
pay for access
access public data
genome-
wide
CRISPR
screens
single-cell
biology
cellular
disease
modeling
inter-
actomics
mutational
scanning
future
innovation
access and analyze meaningful
human genetics data
elucidate target biology leveraging
functional genomics
efficiently prosecute drug
discovery with multiple modalities
7
maze is generating proprietary data on genetic modifiers discovered
from integrated human genetic and functional genomic data

access and analyze meaningful
human genetics data
elucidate target biology leveraging
functional genomics
integrated human genetic and functional genomic data lowers barriers to analysis and
answering questions
proprietary cohort
data for maze
pay for access
access public data
genome-
wide
CRISPR
screens
single-cell
biology
cellular
disease
modeling
inter-
actomics
mutational
scanning
future
innovation
8
advanced data science for analysis of large, integrated data
https://www.anaconda.com/state-of-data-science-2020
a 2020 survey of 2,360 data professionals from 100 countries
indicates that “For most respondents, data management tasks
still consume a disproportionate amount of work time.”
n=1099

9
collaboration with AWS healthcare and life sciences supports a cloud-based data
architecture
visualizationcomputation
graph
database
publication
open data
bioinformatician biologist chemist
cloud compute layer (aws biotech blueprint)
data persistence layer
data management layer
data access layer
object
store
relational
database
(meta)data servicesgovernance
https://aws.amazon.com/quickstart/biotech-blueprint/ FAIR Principles: https://doi.org/10.1038/sdata.2016.18

10
there are many technologies that can be used to
construct a knowledge graph, the Resource
Description Framework (RDF) matches the FAIR
principles’ focus on identifiers and controlled terms
knowledge graph technologies support use cases for standardized datasets that are
designed to be connected
so:Genotype efo:Disease
kg:SMN1 kg:SMA
rdf:type rdf:type
ro:causes
condition
kg:SMN1 ro:causes_condition kg:SMA .
kg:SMA rdf:type efo:Disease .
kg:SMN1 rdf:type so:Genotype .
Prefixes
rdf: RDF specification
ro: Relations Ontology
so: Sequence Ontology
efo: Experimental Factor Ontology
kg: example “knowledge graph” namespace

11
applications of semantic technologies: a bioinformatician, biologist, and chemist walk
into a bar
role user story
bioinformatician “I completed an analysis that includes a report with
my interpretations and tables of statistical model
output, and I want to publish these artifacts to our
data portal where my collaborators can examine
with self-service analytical tools.”
biologist “I am evaluating targets that were identified in a
bioinformatics analysis by reviewing different
sources of evidence, and I need to track the
information I am gathering and present a report to
my team.”
chemist “I received a prioritized list of potential targets for a
given disease from the target discovery team, and I
want to gather information about all compounds
that are known interactors with these targets.”
drug discovery
target discovery
target validation

12
bioinformatics results are used to drive decision making and are managed as key
corporate assets
which genes are
differentially expressed
in this experiment?
collaborators
email data portal
• bioinformatics reports and datasets are treated as
peer-reviewed publications in a centralized data portal
• metadata about results are formal dataset descriptions
with a semantic model and controlled terminology
• analytics applications use microservices to drive data
visualizations and navigate connected datasets
challenge: many “artisanal” analyses are lost
in email, file servers, or messaging services

13
semantic technology components supporting publication of bioinformatics results
• ontology terms define result
types and relationships
• provide canonical labels and
definitions
• designed using the protégé
editor and versioned in git
• analysts initialize a
templated project directory
and environment
• a dataset description is
generated using ontology-
driven tooling
• a validated dataset
description is published to
a central data portal
• metadata is added to a
search index
• tabular files accessed via a
data service api
target
constraint
violation
dataset description
• dataset descriptions are
modeled as a data graph
• the shape constraint
language is used to
validate the graph

14
into a bar
role user story
my team.”
drug discovery
target discovery
target validation

15
expert target evaluations are captured as data using structured evidence annotations
challenge: knowledge gained from literature
and database reviews are hidden in slide decks
does the evidence
support a therapeutic
hypothesis for my gene?
collaborators
slide deck web app
• electronic data capture app used to guide users
through a target evaluation protocol
• figures and visualizations embedded in a web app with
provenance information and evidence ontology codes
• structured annotations used to generate slide decks
and connect to related data using gene identifiers

16
semantic technology components support structured annotation of target evaluations
• analytics app enables ranking genes
and drill down via detailed views
• organized to guide target evaluation
process w/access to evidence
sources
• free-text review, image,
rankings, and source url
for provenance
• semantic evidence codes
are used annotate each
review item
• structured target profiles enable
multiple representations
• target profile slide decks are auto-
populated with evidence reviews
• evaluating knowledge graph
models using nanopublications
and biolink
• data portal services
provide access to
results in apps

17
into a bar
role user story
my team.”
drug discovery
target discovery
target validation

1818
proprietary and shared data are integrated by incrementally expanding the knowledge
graph’s scope
challenge: heterogeneous organization of datasets
are prohibitively time consuming to integrate
What compounds interact
with this target and what
are their properties?
relational
database
graph
database
• significant results from internal analysis and target
reviews include cross references to external datasets
• publicly available gene models and chemical
compounds staged on maze data infrastructure
• solution enables integrated queries over proprietary
and shared data for quickly answering questions
collaborators

semantic technology components support integrated queries over proprietary and
shared data graphs
chembl rdfensembl rdf
• ensembl rdf represents
genomic features,
genomic locations and
cross-references
including to chembl
differential expression rdf
• differential expression
results are
transformed to rdf
using r2rml and linked
using gene identifiers
target review rdf
19
• target reviews are
linked via gene
identifiers to enable
integrated queries with
chembl and ensembl
• chembl rdf explicitly links
chemical, bioactivity, and
genomic data with cross-
references to other
databases

launched in 2019 with
$190m+ investment
based in south san francisco
with ~80 employees
founded on concept of
genetic modifiers
investors
21
translating genetic
modifying insights into
new therapeutics

Focus on the Evidence: a knowledge graph approach to profiling drug targets

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Focus on the Evidence: a knowledge graph approach to profiling drug targets

Semelhante a Focus on the Evidence: a knowledge graph approach to profiling drug targets (20)

Mais de Nolan Nichols

Mais de Nolan Nichols (6)

Último

Último (20)

Focus on the Evidence: a knowledge graph approach to profiling drug targets

Notas do Editor