(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
Stephen Friend SciLife 2011-09-20
1. Integrating layers of omics data models and compute
spaces needed to build a “Convivial Knowledge Expert”
Use of Bionetworks to Build Maps of Diseases
Moving beyond the linear
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ Amsterdam
SciLife
September 21st, 2011
2. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
3. Alzheimer’s Diabetes
Treating Symptoms v.s. Modifying Diseases
Cancer Obesity
Will it work for me?
Biomarkers?
8. WHY NOT USE
“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
9. “Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert
10. It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations
Putative causal gene
Disease trait
11. what will it take to understand disease?
DNA RNA PROTEIN (dark matter)
MOVING BEYOND ALTERED COMPONENT LISTS
13. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
Standard GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
Integrated Genetics Approaches
14. Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)
Causal Inference
“Global Coherent Datasets”
• population based
• 100s-1000s individuals
Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
15. Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
1 2 3 4 Note: NOT a gene
expression heatmap
1
1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression
0.8 1 0.1 -0.6
3
0.2 0.1 1 -0.1
4
-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge
1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)
16. Data integration via Bayesian Network
Yeast segregants Public
databases
******BYRM
Synthetic complete Protein-
medium protein
Logorithm growth interations
Transcription
factor binding
Gene expression sites
genotypes
Yeast segregants
Protein
Metabolite
interations
Bayesian network
Courtesy of Dr. Jun Zhu
17. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
18. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
20. Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources
Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions
21. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
23. Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)
PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
ORM
and Policy Makers, Funders...)
M APS
F
NEW TOOLS
PLAT
NEW
Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
24. Example 1: Breast Cancer
Coexpression Networks
Module combination
Partition BN
Bayesian Network
Survival Analysis
24
Zhang B et al., manuscript
25. Generation of Co-expression & Bayesian Networks from
published Breast Cancer Studies
4 Public Breast Cancer Datasets
NKI: van de Vijver et al. A gene-expression
signature as a predictor of survival in breast
cancer. N Engl J Med. 2002 Dec 19;347
295 samples
(25):1999-2009.
Wang Y et al. Gene-expression profiles to
predict distant metastasis of lymph-node-
negative primary breast cancer. Lancet. 286 samples
2005 Feb 19-25;365(9460):671-9.
Miller: Pawitan Y et al. Gene expression
profiling spares early breast cancer patients
from adjuvant therapy: derived and 159 samples
validated in two population-based cohorts.
Breast Cancer Res. 2005;7(6):R953-64.
Christos: Sotiriou C et al.. Gene
expression profiling in breast cancer:
understanding the molecular basis of 189 samples
histologic grade to improve prognosis. J
Natl Cancer Inst. 2006 Feb 15;98(4):
262-72.
25
26. Recovery
of
EGFR
and
Her2
oncoproteins
downstream
pathways
by
super
modules
28. Key
Driver
Analysis
• IdenDfy
key
regulators
for
a
list
of
genes
h
and
a
network
N
• Check
the
enrichment
of
h in
the
downstream
of
each
node
in
N
• The
nodes
significantly
enriched
for
h
are
the
candidate
drivers
28
29. A) Cell Cycle (blue) B) Chromatin modification (black)
C) Pre-mRNA Processing (brown) D) mRNA Processing (red)
Global driver
Global driver & RNAi
validation
29
31. Example: The Sage Non-Responder Project in Cancer
• To identify Non-Responders to approved drug regimens so
Purpose: we can improve outcomes, spare patients unnecessary
toxicities from treatments that have no benefit to them, and
reduce healthcare costs
Leadership: • Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers
Gary Nolan & Rich Schilsky
Initial • AML (at first relapse)- Jerry Radich
Studies: • Non-Small Cell Lung Cancer- Roy Herbst
• Ovarian Cancer (at first relapse)- Beth Karlan
• Breast Cancer- Dan Hayes
• Renal Cell- Rob Moetzer
Sage Bionetworks • Non-Responder Project
32. Anders
New Type II Diabetes Disease Models Rosengren
Global expression data
340 genes in islet-specific
from 64 human islet donors
open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion
168 overlapping genes, which have
• Higher connectivity
• Markedly stronger association with
• Type 2 diabetes
• Elevated HbA1c
• Reduced insulin secretion
• Enrichment for beta-cell transcription
factors and exocytotic proteins
33. New Type II Diabetes Disease Models Anders
Rosengren
• Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)
• Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test
• Identification of candidate key genes affecting beta-cell differentiation and chromatin
Working hypothesis:
Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion
Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia
Next steps: Validation of hypothesis and suggested key genes in human islets
34. Probing
complex
biology
• The
more
we
learn
about
it,
the
more
complicated
it
becomes!
• Cancers'
geneDc
fingerprints
are
highly
diverse;
most
mutaDons
were
unique
to
individual
• How
to
piece
everything
together?
35. perturba)ons
observa)ons
to
models
perturba)ons
diseases
36. A
framework
for
data
integraDon
knowledge
Medline Biocarta/Biopathway Biologists
High
throughput
data
Microarray data
probabilistic Database
Proteomic data
graphic models
Metabolomic data
Genomics Hypothesis, test GUI
Genetics
37. Yeast
segregants
Public
databases
******BYRM
Synthe)c
complete
Protein-‐
medium
protein
intera)ons
Logorithm
growth
Transcrip)on
Gene
expression
metabolites
genotypes
factor
binding
sites
Protein
Yeast
segregants
Metabolite
intera)ons
Bayesian
network
38. Bayesian
network:
Incorpora)ng
TFBS
and
PPI
data
as
a
scale-‐free
network
prior
• DNA-‐protein
binding
data
– Knowledge
of
what
proteins
regulate
transcripDon
of
a
given
gene
39. PPI:
Can
we
find
informaDon
overlapped
with
gene
expressions?
4-‐clique
4-‐clique
3-‐clique
3-‐clique
Clique
community
(par)al
clique)
Zhu
J
et
al,
Nature
GeneDcs,
2008
40. IntegraDng
transcripDon
factor
(TF)
binding
data
and
PPI
• Introducing
scale-‐free
priors
for
TF
and
large
PPI
complex
• Fixed
prior
for
small
PPI
complex
Zhu
J
et
al,
Nature
GeneDcs,
2008
41. Integra)on
improves
network
quali)es
BN KO data GO terms TF data • The
pair
is
independent
w/o any priors 125 55 26
w/ genetics
priors 139 59 34
w/ genetics, TF
and PPI
priors 152 66 52 • The
pair
is
causa/reacDve
Zhu
et
al.,
Cytogene)cs,
2004
Zhu
et
al.
PLoS
Comp
Biol.
2007
Zhu
J
et
al.,
Nature
Gene)cs,
2008
42. ProspecDve
validaDon
is
the
gold
standard
ILV6
gives
rise
to
large
expression
signature
GCN4
•
ILV6
KO
sig
enriched
(p~10E-‐52)
•
GCN4
upregulated
in
ILV6
KO
large
signature
ILV6
LEU2
KO
gives
rise
to
small
expression
signature
•
LEU2
KO
sig
enriched
(p~10E-‐18)
•
GCN4
downregulated
in
LEU2
KO
small
signature
LEU2
GCN4
Zhu
J
et
al.,
Nature
Gene)cs,
2008
43. Lung
Cancer
Bayesian
network
• Built
from
240
lung
cancer
samples
• 7785
genes
• 10642
links
45. Direct
overlaps
of
EMT
proteomic
signatures
extract surface media
extract 267 75 57
surface 31% 240 52
media 27% 25% 208
46. Analysis
of
EMT
signatures
through
lung
cancer
network
Cell
medium
signature
Cell extract
signature
overlaps
Cell surface
signature
De novo constructed lung
signatures cancer regulatory network subnetworks
47. Hallmark
features
of
EMT
• Decrease of E-cadherin and increase of
Vimentin (criteria used in defining
signatures)
– CTNNA1 is in all three proteomic signatures;
• There are 6 nodes connected to CTNNA1 in the
lung cancer network including ARF1
– VIM is in all three proteomic signatures
• There are 7 nodes connected to VIM in the lung
cancer network including NOTCH2, EMP3
48. Lung
Network
Conclusions
• All
proteomic
signatures
are
coherently
co-‐
regulated
at
transcripDon
level;
• GOBP
annotaDons
for
signatures
in
cell
surface,
condiDoned
media
fracDons
are
expected;
• Lipid
metabolism
for
signatures
in
the
total
cell
extract
is
also
expected.
• Subnetworks
for
EMT
proteomic
signatures
contain
all
known
hallmark
features
of
EMT;
• These
subnetworks
can
provide
beger
context
to
understand
EMT
and
to
idenDfy
key
regulators
of
EMT.
49. Can we accelerate the pace of scientific discovery?
2008
2009
2010
2011
50. How is the Federation different
from a “traditional” collaboration?
collaboration 2.0
51. Rules of the game:
transparency & trust
Shared data tools models and prepublications
Conflict of interests
Intellectual property
Authorship
52. sage bionetworks synapse project
Watch What I Do, Not What I Say Reduce, Reuse, Recycle
My Other Computer is Amazon
Most of the People You Need to Work
with Don’t Work with You
53. sage federation:
scientific pilot projects
Type-II diabetes
Warburg effect in cancer
Human aging
Cellular
mortality
(aging)
Cellular
immortality
(cancer)
55. warburg effect project
interconnected discovery
bioinformatic
bioinformatic confirmation that metabolic
coherent taylor,
genes implicated in the warburg effect are
prostate sawyers, et
differentially expressed across a wide range
dataset al., mskcc!
of cancer types!
butte lab
layer in!
additional !
tissue types!
network dynamics prostate network modeling
generate an aerobic glycolysis signature and
ʻreverse engineerʼ master transcription factor genes associated with poor prognosis in
breast
regulators of the warburg effect transcriptional prostate cancer are disproportionately found
program ! amongst networks regulating glycolysis genes!
colon
lung
renal
brain
b cell
califano lab sage bionetworks
56. warburg effect project
so what? discovery integration to translation, a validation path
peter nelson!
pre-clinical target validation!
from thompson, science, 2009!
jim olson!
from sawyers, pcctc presentation!
target identification! clinical validation!
lists of nodes and key drivers!
57. sage federation:
human aging project
JusDn
Guinney
Stephen
Friend*
Mariano
Alvarez
Celine
Lefebrev
Greg
Hannum
Andrea
Califano*
Januz
Dutkowski
Trey
Ideker*
Kang
Zhang*
58. sage federation:
what is the impact of disease/environment on “biological age” ?
Biological
Age
2009
2001
Chronological
Age
59. sage federation:
model of biological age
Faster Aging
Predicted
Age
(liver
expression)
Slower Aging
Clinical Association
- Gender
- BMI
- Disease
Age Differential Genotype Association
Gene Pathway Expression
Chronological
Age
(years)
60. human aging:
clinical associations with differential aging –
gene expression in human liver
Faster
Aging
!
20
!
Bioage
Difference
10
(years)
0
Slower
Aging
−10
!
!
Female
FALSE TRUE
Underweight
Fit
Overweight
Male
64. human aging:
mechanism of biological aging
stochasDc
vs
mechanisDc
deterioraDon
by
random
“hits”
systemaDc
process
65. human aging:
mechanism of aging – higher entropy?
Primary Cohort Secondary Cohort
14 p=6.616e!16 p=3.528e!05
!48
12
10 !50
Entropy
Entropy
8 !52
6
!54
4
!56
2
40 50 60 70 80 90 40 45 50 55 60 65
Mean age per window Mean age per window
66. human aging:
research summary
• Created predictive model of biological age
• Type-II diabetes strongly correlates with accelerated aging
• Identified genetic variant that may induce methyl-driven
acceleration of aging
• General mechanism of aging may involve loss of signal and
increased disorder within the methylome
68. Federated
Aging
Project
:
Combining
analysis
+
narraDve
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared
Data
JIRA:
Source
code
repository
&
wiki
Repository
69. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
70. Synapse
as
a
Github
for
building
models
of
disease
82. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
83. Moving beyond the linear
linear pathways
linear ways of building models
linear ways of working together
85. OPPORTUNITIES FOR THE SCILIFE COMMUNITY
Data sets, Tools and Models
Joining Synapse Communities
Joining Federation Projects
Joining Arch2POCM
Change reward structures for sharing data
(patients and academics)