Presenter: Marina Sirota, UCSF
Recent advances in genome typing and sequencing technologies have enabled quick generation of a vast amount of molecular data at very low cost. The mining and computational analysis of this type of data can help shape new diagnostic and therapeutic strategies in biomedicine. In this talk, I will discuss how such technological advances in combination with data science and integrative analysis can be applied to drug discovery in the context of drug target identification, computational drug repurposing, and population stratification approaches.
4. Moore’s Law – Biology and
Computation
Cost Per Genome
Cost of Computational Resources
5. Can we use data integration
to…
Biomarker
Discovery
… to find
better
diagnostic
markers?
Disease
Mechanism
… understand
disease
better?
Therapeutics
… find new
uses for
existing
drugs?
6. Motivation
• Problem:
– Takes roughly 15 years and over
$800 million to develop and bring
a novel drug to market
– 90% of drugs fail in early
development
• Solution: Drug Repurposing
– Lower cost
– Reduce risk of failure
7. Problem Statement
Can we use public data to
systematically predict relationships
between drugs and diseases?
Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, Sage J, Butte AJ. Discovery and Validation of
Drug Indications Using Compendia of Public Gene Expression Data. Science Translational Medicine. Aug 2011.
8. Problem Statement
Can we use public data to
systematically predict relationships
between drugs and diseases?
Diseases
Drugs
Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, Sage J, Butte AJ. Discovery and Validation of
Drug Indications Using Compendia of Public Gene Expression Data. Science Translational Medicine. Aug 2011.
9. What is Gene Expression
Profiling?
• Global snapshot of cellular function and
activity
– Genome sequence – what might be going on
– Expression – what is actually going on
• 25,000 genes 1,000,000 proteins
• We can measure a few thousand proteins,
but gene expression is a global proxy
How Can We Measure Expression?
10. Microarrays
• Thousands of probes are hybridized to a solid
surface
• Takes advantage of complementary DNA
sequences
• Process:
– RNA is extracted from the sample
– Fluorescent labeling
– Hybridization and wash
– Scanning and signal processing
– Normalization and analysis!
11. Data Sources
• Collection of expression
data from cultured human
cells
• 453 experiments of 164
drugs
• Covers broad range of
effects
– FDA approved drugs
– Non drug bioactive small
molecules
• Publicly available
gene expression
repository
– Platforms – 11,745
– Samples – 961,202
– Series -39,679
• There are numerous
experiments dealing
with over 200
diseasesBarrett et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009.
Lamb et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease.
Science. 2006.
12. Disease Gene Expression Data
(GEO)
Butte AJ, Chen R. AMIA,
2006.
Download all GDS
Experiments GEO
Identify Disease
Associated Experiments
Identify Normal vs.
Disease Experiments
176 datasets, 3113
arrays, 100 diseases
Dudley J, Butte AJ. PSB,
2008.
Dudley JT, Tibshirani R, Deshpande T, Butte AJ. Disease signatures are robust across tissues and experiments. Mol
14. Drug Gene Expression Profile
Treated
Sample
Untreated
Sample
Drug Gene Expression Profile
15. Up-regulated Down-regulated
Hypothesis
Gene Expression Profiles
Disease Drug BDisease Drug A Disease Drug C
Genes
Genes
Genes
Treatment Adverse Reaction
?
????
Lamb et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease.
22. Crohn’s Disease
• An inflammatory disease of the intestines
that has an autoimmune component
• Affects 500,000 people in North America
• No known pharmaceutical cure
• Current solutions:
– Reduce inflammation with anti-
inflammatory drugs and
corticosteroids (prednisone)
– Bad side effects
– Surgical solutions
25. Topiramate – An Anti-Seizure
Drug
• Suppresses the rapid and excessive
firing of neurons that start a seizure
• Enhances GABA-activation
• Used to treat epilepsy, bipolar disorder
• Antidepressant
• Investigated as potential
treatment for obesity and
type II diabetes
26. Topiramate and Crohn’s
Genes that are
up-regulated by the drug are
down-regulated in the disease
Genes that are
down-regulated by the drug ar
up-regulated in the disease
27. Animal Model for Crohn’s
• TNBS (trinitrobenzene sulfonic acid) +
ethanol induced rats:
– Excellent and reproducible experimental model
for Inflammatory Bowel Disease (Crohn’s and
Ulcerative Colitis)
– Toxin-based model
Normal TNBS Induced
28. Pilot Validation Study Design
• Pilot Study – 18 rats
– Healthy (control)
– TNBS-Induced Untreated
– TNBS-Induced Treated
• 80 mg/kg topiramate, injected daily
• Colon tissue macroscopic damage score
Reetesh Pai, Mohan Shenoy and Pankaj Jay
36. Ongoing work
• Extending the drug datasets to use structural
data
• Incorporating meta-analysis methods
• Application to cancer (lung cancer, liver
cancer, medulloblastoma)
• More focused cell line selection
• Looking at dosage response and combination
therapy prediction
• Leveraging EMR and clinical trial dataChen B, Sirota M, Fan-Minogue H, Hadley D, Butte AJ. Relating Hepatocellular Carcinoma Tumor Samples and Cell
Lines Using Gene Expression Data in Translational Research. BMC Medical Genomics, 2015.
Wu M, Sirota M, Butte AJ, Chen B. Characteristics of drug combination therapy in oncology by analyzing clinical trial
data on clinicaltrials.gov. Pac Symp Biocomput. 2015.
37. Can we use data integration
to…
Biomarker
Discovery
… to find
better
diagnostic
markers?
Disease
Mechanism
… understand
disease
better?
Therapeutics
… find new
uses for
existing
drugs?
39. Acknowledgements
Atul Butte
Joel Dudley Annie P. Chiang
Alex Morgan Pankaj Jay Pasricha
Mohan Shenoy Minnie Sarwal
Reetesh Pai Julien Sage
Silke Roedder Alejandro Sweet-
Cordero
Bin Chen Hanna Paik
Dexter Hadley
Good morning my name is Marina Sirota. I’m currently a lead research scientist in the division of systems medicine at Stanford university. Previously I worked at Pfizer under David Cox in a genetics group working on applying next gen sequencing technologies to discover novel drug targets and develop population stratification techniques for clinical trials. Today I will tell you a bit about translational bioinformatics, systems medicine and how it might impact transplantation practice in the near future
Since then we have come a looong way. Thousands of people have been sequenced and millions of individuals have been genotyped. These are all resources that have been created and most importantly they are open to the public.
People are also starting to use sequencing in creative ways – immunome, metagenome, epigenetics cell-free DNA.
The observation made in 1965 by Gordon Moore, co-founder of Intel, that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented. Moore predicted that this trend would continue for the foreseeable future.
Drug repurposing is finding a novel indication for a known FDA approved drugs
Computational work can be instrumental in making this feasible
Examples: Viagra - was initially studied for use in hypertension
So far I have focused on the genetics piece or the DNA, I would like to talk a little bit about RNA or the middle piece of the central dogma and ways we can measure it using gene expression profiling
Expression profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment Used to generate hypothesis, mechanism of action
Post translational modification, alternative splicing
Microarrays are one technology to measure gene expression. They were first developed in 1995, nearly 20 years ago.
Have GEO data, but the format doesn’t make it easy to ask relevant questions
We have built an infrastructure to enable this sort of analysis
This approach is especially likely to yield good results for diseases with a strong gene dis-regulation component such as autoimmune disease or cancer
If the up-regulated disease genes appear near the top (up-regulated) of the rank-ordered drug gene expression list and the down-regulated disease genes fall near the bottom (down-regulated) of the rank-ordered drug gene expression list we can conclude that the drug and the disease expression profiles are similar
if the up-regulated disease genes fall near the bottom of the rank-ordered drug gene expression list and the down-regulated disease genes are near the top of the rank-ordered drug gene expression - therapeutic
Randomization by picking a signature at random and recomputing drug disease scores 100 times FDR
100 diseases 164 drugs
16000 drug-disease pairs
53 diseases significant predictions
Not everything is treatable
Hieararchical clustering
Brain cancers
Other Cancers
Lung Injury
UC and crohn’s
histone deacetylase (HDAC) inhibitors (in red)
Drugs known to affect different parts of the same pathway also cluster together:
phosphatidylinositol-3-kinase (PI3K) inhibitors LY−294002 and wortmannin (in green)
heat shock protein 90 (HSP90) inhibitors (in orange)
Chose Crohn’s but have others
Known drug
Two that are better
One is FDA approved so go for this one
Looked for an animal model of Crohn’s and found one
Define macroscopic damage score
Scale 0-6 what they mean
Define axes
Earlier this year President Obama launched a $215 million investment in Precision Medicine Initiative will pioneer a new model of patient-powered research that promises to accelerate biomedical discoveries and provide clinicians with new tools, knowledge, and therapies to select which treatments will work best for which patients.