Slides from my talk at the ACS CINF Symposium on Collaborations & Data Sharing in Rare & Orphan Disease Drug Discovery on 31 March 2019 in Orlando.
Abstract:
For the pharmaceutical industry as a whole, addressing the challenge of rare or orphan diseases is high on the agenda. But for the patients and their families, rare diseases can be very isolating and it can often feel like the potential for new treatments is low. One avenue for potential treatments is to identify drug repurposing candidates for the rare disease in question. This talk will give an overview of various collaborative projects undertaken in the last few years, which involved the combination, normalisation and analysis of data from various disparate sources, including some valuable lessons learnt along the way.
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF 20, ACS National Meeting 2019-03-31)
1. CINF20 - 31 March 2019
Dr Frederik van den Broek, Elsevier Professional Services
Data-driven drug
discovery for rare
diseases
Tales from the trenches
2. This is what we are all after in drug discovery…
Image: Elsevier
3. If drug discovery and development only were that simple…
Disease
Drug
compound
4. If drug discovery and development only were that simple…
Disease
Protein
Target
Drug
compound
5. If drug discovery and development only were that simple…
Disease
Protein
Target
Drug
compound
• Cell processes
• Regulators
• Pathways
• …
• Bioactivity
• Toxicity
• Specificity
• …
6. If drug discovery and development only were that simple…
Disease
Protein
Target
Drug
compound
• Cell processes
• Regulators
• Pathways
• …
• Bioactivity
• Toxicity
• Specificity
• …
• Availability
• Synthesis
• PK/PD
• …
• Genotype
• Phenotype
• Individual
7. If drug discovery and development only were that simple…
Disease
Protein
Target
Drug
compound
• Cell processes
• Regulators
• Pathways
• …
• Bioactivity
• Toxicity
• Specificity
• …
• Availability
• Synthesis
• PK/PD
• …
• Genotype
• Phenotype
• Individual
8. This makes it all a lengthy and costly process
Image: https://www.phrma.org/graphic/the-biopharmaceutical-research-and-development-process
9. With rare diseases it is even harder
Small(er) patient populations leading to
• Less (integral) medical and scientific knowledge
• Small population for clinical trials
• Unawareness with doctors, researchers, policymakers
• Smaller potential market size for a drug
Image: http://www.campingtourist.com/camping-activities/climbing/difficult-mountains-climb/
10. Drug repurposing: a new hope for rare diseases
• Less costly and of interest for pharma
• Quicker to Phase II/III tests, so hopefully quicker to market
• Need reliable information from various sources to find suitable repurposing
candidates
Image: https://www.starwars.com/news/poll-what-is-the-best-scene-in-star-wars-a-new-hope
11. Accelerate with new knowledge and data
Disease
Protein
Target
Drug
compound
• Cell processes
• Regulators
• Pathways
• …
• Bioactivity
• Toxicity
• Specificity
• …
• Availability
• Synthesis
• PK/PD
• …
• Genotype
• Phenotype
• Individual
12. Various initiatives we were recently involved in
• Project with Findacure to find drug repurposing candidates for Congenital
Hyperinsulinism
• Pistoia Hackaton: Elsevier-Findacure challenge on Friedrich’s Ataxia
• Sub-network enrichment analysis for neuromuscular disorder pathways
• Disease pathway analysis for Huntingdon's Disease
• Pistoia Datathon for drug repurposing for rare diseases
13. | 13
• A rare genetic disease
• Permanently excessive level of insulin in the
blood
• Develops within the first few days of life
• Can lead to brain injury or even death
• In the most severe cases the only viable treatment is
the removal of the pancreas, consigning the patient to
a lifetime of diabetes
Congenital hyperinsulinsm (CHI)
https://res.cloudinary.com/indiegogo-media-prod-
cld/image/upload/c_limit,w_620/v1440424745/uzvnq
zhvbpsrtthzxqpu.jpg
14. Creating a comprehensive view of CHI
• CHI Literature Library
• Disease, Target, Pathway, and
Compound Analysis
• Research Landscape Analysis
Information Assets Applied
• Content Elsevier’s vast set of literature and patent data
• Data normalization Taxonomies and dictionaries to
normalize author names, institutions, drugs, targets, and
other important terms
• Information extraction Finding semantic
relationships, targets, pathways, drugs, and bioactivities
15. Building and refining the CHI disease model
Picked relevant
pathways
(from a collection of 1800
models)
Explored functions of
proteins using 6.2M pre-
text mined relations
and embedded Gene
Ontology
Summarized what is known
about CHI mechanism in an
overview model
16. From pathways to CHI treatments:
Automated analysis combines bioassay data with pathway data
Mean of activities among
these targets
Me
Targets and activities for
each compound
Drug-likeness
metrics for
sorting/classification
• All compounds that
were observed to bind
to targets in pathway
• Sorted by number of
active targets.
Too many targets may
suggest lack of specificity.
Find all targets that
could be used to affect
the disease state
Query for each target to find
compounds that have high
affinity for them (>6 log units)
Collate data by compound to summarize the
targets/activities related to disease that the
compound hits
• Compute geometric mean of activities for ranking
• Rank by number of targets and geometric mean of
activities against targets
Step 1 Step 2
Step 3
17. Pistoia Hackathon Challenge (2017)
Elsevier would like you to demonstrate the ability of deep learning to help
Findacure, a UK-based charity, accelerate treatment and clinical research for
Friedreich’s ataxia (FRDA). You’ll have access to a heterogeneous set of
data related to the disease: biological pathway analysis, associated chemical
compounds and bioactivities, potential candidates for drug re-purposing, full-
text scientific literature and clinical trial data.
Basically, giving others a go with the data sets we worked with on CHI….
18. Promising results, but still hard work
“We spent most of our time the first day just trying to get our heads around
the data, so we could start to find some solutions. Even opening the files was
tricky.” The students used various tools to try to extract data from the
provided XML files, but it was slow going. Daniel [one of the participants]
commented that, “we wound up having to do a lot of things manually, so we
could at least read the files in plain text.”
19. Sharing disease pathways
• Shared curated pathways (with supporting literature
references) with rare disease organisations to help their
discussions with researchers and fill in potential “blanks”
• Comparing gene expression algorithms for the identification
of expression regulators
• Well-defined datasets, with supporting
literature references which resonate
with researchers
21. “Machine learning
won’t work if your data
is rigidly siloed.”
“One major challenge
is collecting enough
reliable information to
properly train AI systems.
AI is as good as the
data.”
Nick Patience
Founder, 451
Research
“Organizations need to
make sure that the data
being accessed is
treated and defined
consistently across the
sources. Otherwise,
virtualization won't work.”
“All the major AI
advances have been
fueled by advances in
data sets. The algorithms
are easy….
"Collecting, classifying
and labeling datasets
used to train the
algorithms is the grunt
work that’s difficult”
Aspuru-Guzik
Professor of Chemistry &
Machine Learning, Harvard
University JJ Guy
CTO, Jask (AI co.)
‘Siloed’ Lack of standards
Requires labeling and
contextPoor quality1
2 3 4
Using the Entellect Platform and Data Curation
22. Access, curation of
authoritative life science
data
Integration of disparate
data, structured and
unstructured
Normalized and
standardized data with
industry standard
taxonomies
Build custom and off-the-
shelf analytics tools
‘Un-siloed’ Harmonized Enriched and linkedQuality
Nick Patience
Founder, 451
Research
Aspuru-Guzik
Professor of Chemistry &
Machine Learning, Harvard
University
1
2 3 4
Using the Entellect Platform and Data Curation
24. Various teams using various approaches
• Semantic data: Target Identification
• Semantic data: Small Molecule Binding
• Machine Learning
− Ensemble Learning
− Mol2Vec, Prot2Vec
− Network diffusion
• Expert collaboration
− Virtual docking
− Adverse Event profiling
“I could work on the important stuff straight away, using all the data”
26. Aiming to make data-driven drug discovery for rare diseases
a little easier…
Disease
Protein
Target
Drug
compound
• Cell processes
• Regulators
• Pathways
• …
• Bioactivity
• Toxicity
• Specificity
• …
• Availability
• Synthesis
• PK/PD
• …
• Genotype
• Phenotype
• Individual
27. Conclusions
• Data, data, data…
• Data has to be FAIR and of good and trusted provenance as the
researchers and clinicians will want to see the “chain of evidence” (beware
of black box models)
• Data sets also have to be FAIR for each other: enabling the integral
approaches repurposing needs have to be linked data sets across siloes
and domains to go from disease to target to compound (and back)
Image: Sangya Pundir, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=53414062
28. Acknowledgements
• Maria Shkrob
• Jabe Wilson
• Anton Yuryev
• Matthew Clark
• Christy Wilson
• Finlay Maclean
• Elsevier’s Entellect team
• Pistioia hackaton and datathon teams