Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
1. If the physicists do it, the software engineers do it,
Why can’t we do it?:
Moving beyond linear investigations
Both of the science and of how we work
Integrating layers of omics data models and building
using compute spaces capable of enabling models
to be evolved by teams of teams
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ Amsterdam
February 23, 2012
2.
3. So
what
is
the
problem?
Most
approved
therapies
were
assumed
to
be
monotherapies
for
diseases
represen4ng
homogenous
popula4ons
Our
exis4ng
disease
models
o9en
assume
pathway
knowledge
sufficient
to
infer
correct
therapies
8. “Data Intensive” Science- Fourth Scientific Paradigm
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Host evolving computational models
in a “Compute Space”
9.
10.
11. WHY
NOT
USE
“DATA
INTENSIVE”
SCIENCE
TO
BUILD
BETTER
DISEASE
MAPS?
12. what will it take to understand disease?
DNA
RNA
PROTEIN
(dark
maHer)
MOVING
BEYOND
ALTERED
COMPONENT
LISTS
14. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
18. “Data Intensive” Science- Fourth Scientific Paradigm
Score Card for Medical Sciences
Equipment capable of generating
massive amounts of data A-
IT Interoperability D
Open Information System D-
Host evolving computational models
in a “Compute Space F
19. We still consider much clinical research as if we were
hunter gathers - not sharing
.
26. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
28. JUN ZHU
Model of Breast Cancer: Co-expression
A) Miller 159 samples B) Christos 189 samples
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
C) NKI 295 samples
E) Super modules
Cell
cycle
Pre-mRNA
ECM
D) Wang 286 samples Blood vessel
Immune
response
28
Zhang B et al., Towards a global picture of breast cancer (manuscript).
29. CHRIS
GAITERI-‐ALZHEIMER’S
What
is
this?
Bayesian
networks
enriched
in
inflammaQon
genes
correlated
with
disease
severity
in
pre-‐frontal
cortex
of
250
Alzheimer’s
paQents.
What
does
it
mean?
InflammaQon
in
AD
is
an
interacQve
mulQ-‐pathway
system.
More
broadly,
network
structure
organizes
complex
disease
effects
into
coherent
sub-‐systems
and
can
prioriQze
key
genes.
Are
you
joking?
Gene
validaQon
shows
novel
key
drivers
increase
Abeta
uptake
and
decrease
neurite
length
through
an
ROS
burst.
(highly
relevant
to
AD
pathology)
30. ELIAS NETO Causal Model Selection Hypothesis Tests in Systems Genetics
Elias Chaibub Neto1, Aimee T. Broman2, Mark P. Keller2, Alan D. Attie2, Bin Zhang1, Jun Zhu1, Brian S. Yandell2
1 Sage Bionetworks, Seattle, WA USA; 2 University of Wisconsin-Madison, Madison, WI USA
Abstract Vuong’s Model Selection Test Causal Model Selection Tests (CMST) Simulation Study
Current efforts in systems genetics have focused on the Vuong's test derives from the Kullback-Leibler Information In our applications we consider four models: M1, M2, M3 and We conducted a simulation study generating data from the
development of statistical approaches aiming to disentangle Criterion (KLIC). M4. models on
causal relationships among molecular phenotypes in segregating the Figure below.
populations. Model selection criterions, such as the AIC and Let h0(y | x) represent the true model. We derive intersection-union tests based on six separate Vuong
BIC, have been widely used for this purpose, in spite of being (Clarke) tests:
unable to quantify the uncertainty associated with the model Consider the parametric family of conditional models: {f(y | x; f1 vs f2 , f1 vs f3 , f1 vs f4 , f2 vs f3 , f2 vs f4 , f3 vs f4
selection call. Here we propose three novel hypothesis tests to φ): φ ϵ Ф}.
perform model selection among models representing distinct We propose three distinct CMST tests: (1) parametric, (2) non-
Then parametric, and (3) joint-parametric CMST tests.
causal relationships. We focus on models composed of pairs of
phenotypes and use their common QTL to determine which KLIC(h0, f) = E0[log h0(y | x)] – E0[log f(y | x; φ)], The results are shown below:
phenotype has a causal effect on the other, or whether the
phenotypes are not causally related, and are only statistically where the expectation E0 is computed w.r.t h0(y, x), and φ* is the Parametric CMST:
associated. Our hypothesis tests are fully analytical and avoid parameter value that minimizes KLIC(h0, f).
H0: model M1 is not closer to the true model than M2, M3 or M4.
the use of computationally expensive permutation or re-sampling
Consider two models: f1 ≡ f1(y | x; φ1*) and f2 ≡ f2(y | x; φ2*). H1: model M1 is closer to the true model than M2, M3 and M4.
strategies. They adapt and extend Vuong's (and Clarke’s) model
selection test to the comparison of four possibly misspecified
models, handling the full range of possible causal relationships Model f1 is a better approximation of h0 than f2 if and only if H0: { E0[LR12] = 0 } { E0[LR13] = 0 } { E0[LR14] = 0 }
among a pair of phenotypes. We evaluate the performance of our H1: { E0[LR12] > 0 } ∩ { E0[LR13] > 0 } ∩ { E0[LR14] > 0 }
tests against the AIC, BIC and a published causality inference KLIC(h0, f1) < KLIC(h0, f2) E0[log f1] > E0[log f2].
The rejection region and p-value for this IU-test are given by:
test in simulation studies. Furthermore, we compare the
precision of the causal predictions made by the methods using Let LR12 = log f1 – log f2. Then we test
biologically validated causal relationships extracted from a min{z12 , z13 , z14} > cα , p1 = max{p12 , p13 , p14}.
database of 247 knockout experiments in yeast. Overall, our H0: E0[LR12] = 0, H1: E0[LR12] > 0, H2: E0[LR12] < 0.
model selection hypothesis tests achieve higher precision than
the alternative methods at the expense of reduced statistical The quantity E0[LR12] is unknown, but the sample mean and Non-parametric CMST:
power. variance of
Analogous to the parametric CMST. Just replace Vuong’s by
LR = log f – log f 2,i, f 1 ≡ f(y | x; φ 1), φ ≡
12,i 1,i 1 Clarke’s tests.
ML est. of φ1
Pairwise Causal Models
converve a.s. to E0[LR12] and Var0[LR12] = σ12.12 . Joint parametric CMST:
Given a pair of phenotypes, Y1 and Y2, that co-map to the same
quantitative trait loci, Q, we consider the following models: Let LR = ∑ LR , then under H0
12 12,i Simple application of Vuong tests, overlooks the dependency
among the test statistics.
(n σ 12.12 )−1/2 LR 12 →d N(0, 1).
Let S1 represent the sample covariance matrix of LR 12,i , Yeast Data Analysis
If different models have different dimensions we consider
LR 13,i and LR 14,i.
We analyzed the yeast genetical genonics data set from Brem
LR *12 = LR 12 – D12 Under regularity conditions we have that S1 converges a.s. to and Kruglyak (2005).
Σ1.
where D12 represents a difference of AIC or BIC penalties, and We evaluated the precision of the causal predictions made by
adopt the test statistic the methods using validated causal relationships extracted
It follows from the MCT and Slutsky’s theorem that when
Z12 = (n σ 12.12 )−1/2 LR *12 . from a data-base of 247 knock-out experiments (Hughes
( E0[LR12] , E0[LR13] , E0[LR14] )T = ( 0 , 0 , 0 )T 2000, Zhu 2008).
Clarke’s Model Selection Test
we have that In total, 46 of the ko-genes showed significant eQTLs, and
Conclusions Represents a non-parametric version of Vuong’s test. we tested a total of 4,928 ko-gene/putative target gene
Z1 = n−1/2 diag(S1 )−1/2 LR 1 →d N3(0 , ρ1) relations.
Advantages of the Causal Model Selection Tests: Vuong’s null: the mean log-likelihood ratio is 0.
Clarke’s null: the median log-likelihood ratio is 0. where LR 1 = ( LR 12 , LR 13 , LR 14 )T and ρ1 = diag
1- Fully analytical hypothesis tests that avoid the use of (S1)−1/2 Σ1 diag(S1)−1/2
computationally expensive permutation or re-sampling Paired sign test on log-likelihood scores:
techniques. We consider the hypotheses
Scores: (LR 12,1 , LR 12,2 , LR 12,3 , LR 12,4 , LR 12,5 ,
2- Achieve better controlled type I error rates. … , LR 12,n ) H0: min{ E0[LR12] , E0[LR13] , E0[LR14] } ≤ 0
Signs: ( + , − , + , + , − , … , H1: min{ E0[LR12] , E0[LR13] , E0[LR14] } > 0
3- Achieve higher precision rates. + )
and adopt the test statistic W1 = min{Z1}. The p-value is
Let, T12 = {# of positive signs}. Then under Clarke’s null computed as
Main disadvantage: lower statistical power.
T12 ~ Binomial(n, 1/2). P(W1 ≥ w1) = P(Z12 ≥ w1 , Z13 ≥ w1 , Z14 ≥ w1).
31. ELIAS NETO
Causal Model Selection Hypothesis Tests in Systems Genetics
The Schadt et al. (2005) approach was based on
a penalized likelihood model selection approach,
were we simply select the model with the best
score.
The proposed hypothesis test allows us to attach
a p-value to the selected model and, in this way,
allows the quantification of the uncertainty
associated with the model selection call.
The proposed tests are fully analytical and avoid
computationally expensive permutation and re-
sampling techniques.
32. ZHI
WANG
A
mulQ-‐Qssue
immune-‐driven
theory
of
weight
loss
Hypothalamus
Lep4n
signaling
FaDy
acids
Macrophage/
inflamma4on
Liver
Adipose
M1
macrophage
Phagocytosis-‐
Phagocytosis-‐
induced
lipolysis
induced
lipolysis
33. PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
ORM
M APS
F
PLAT
NEW
RULES GOVERN
34.
35. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
41. Sage Metagenomics Project
Processed Data
(S3)
• > 10k genomic and expression standardized datasets indexed in SCR
• Error detection, normalization in mG
• Access raw or processed data via download or API in downstream analysis
• Building towards open, continuous community curation
42. Sage Metagenomics using Amazon Simple Workflow
Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/
43. Amazon SWF and Synapse
• Maintains state of analysis • Hosts raw and processed data for
• Tracks step execution further reuse in public or private
projects
• Logs workflow history
• Provides visibility into
• Dispatches work to Amazon or intermediate results and
remote worker nodes algorithmic details
• Efficiently match job size to • Allows programmatic access to
hardware data; integration with R
• Provides error handling and • Provides standard terminologies
recovery for annotations
• Search across data sets
44. Synapse Roadmap
• Data Repository
• Projects and security Synapse Platform Functionality
• R integration • Workflow templates
• Analysis provenance • Social networking
• Publishing figures • User-customized
• Search • Wiki & collaboration tools dashboards
• Controlled Vocabularies • Integrated management • R Studio integration
• Governance of restricted of cloud resources • Curation tool integration
data
Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future
Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013
• TCGA • Predictive modeling • TBD: Integrations with other
• METABRIC breast workflows visualization and analysis
cancer challenge • Automated processing of packages
common genomics platforms
• 40+ manually curated clinical studies
• 8000 + GEO / Array Express datasets
• Clinical, genomic, compound sensitivity
• Bioconductor and custom R analysis
Data / Analysis Capabilities
46. Now
accep4ng
submissions
Editor-‐in-‐Chief
Eric
Schadt
(USA)
Open
Network
Biology
is
an
open
access
journal
that
publishes
arQcles
relaQng
to
predicQve,
network-‐based
models
of
living
systems
linked
to
the
corresponding
coherent
data
sets
upon
which
the
models
are
based.
In
addiQon
to
arQcles
describing
these
large
data
sets,
the
journal
also
welcomes
submissions
of
original
research,
sobware
and
methods,
along
with
reviews
and
commentary,
relevant
to
the
emerging
field
of
network
biology.
Submit
your
manuscript
and
benefit
from:
•
High
visibility
for
arQcles
through
unrestricted
online
access
•
Free
arQcle
redistribuQon
under
a
CreaQve
Commons
aHribuQon
license
•
No
limits
on
arQcle
length,
addiQonal
files,
colour
figures
or
movies
•
Rapid,
immediate
open
access
publicaQon
on
acceptance
•
An
integrated
repository
for
network
model
data
and
code
www.opennetworkbiology.com
47. Five
Pilots
involving
Sage
Bionetworks
CTCAP
Arch2POCM
The
FederaQon
ORM
S
Portable
Legal
Consent
MAP
F
Sage
Congress
Project
PLAT
NEW
RULES GOVERN
48. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
Started Sept 2010
49. Shared clinical/genomic data sharing and analysis will
maximize clinical impact and enable discovery
• Graphic
of
curated
to
qced
to
models
50. Arch2POCM
Restructuring
the
PrecompeQQve
Space
for
Drug
Discovery
How
to
potenQally
De-‐Risk
High-‐Risk
TherapeuQc
Areas
51.
52. Arch2POCM: scale and scope
• Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/
Immunology. One for Neuroscience/Schizophrenia/Autism. Both
programs will have 8 drug discovery projects (targets) - ramped up
over a period of 2 years
– It is envisioned that Arch2POCM’s funding partners will select targets
that are judged as slightly too risky to be pursued at the top of pharma’s
portfolio, but that have significant scientific potential that could benefit
from Arch2POCM’s crowdsourcing effort
• These will be executed over a period of 5 years making a total of 16
drug discovery projects
– Projected pipeline attrition by Year 5 (assuming 12 targets loaded in
early discovery)
• 30% will enter Phase 1
• 20% will deliver Ph 2 POCM data 52
54. How can we accelerate the pace of scientific discovery?
2008
2009
2010
2011
Ways to move beyond
“traditional” collaborations?
Intra-lab vs Inter-lab
Communication
Colrain/ Industrial PPPs Academic
Unions
56. sage federation:
model of biological age
Faster Aging
Predicted
Age
(liver
expression)
Slower Aging
Clinical Association
- Gender
- BMI
- Disease
Age Differential Genotype Association
Gene Pathway Expression
Chronological
Age
(years)
57. Reproducible
science==shareable
science
Sweave: combines programmatic analysis with narrative
Dynamic generation of statistical reports
using literate data analysis
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
58. Federated
Aging
Project
:
Combining
analysis
+
narraQve
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared
Data
JIRA:
Source
code
repository
&
wiki
Repository
60. Presentation outline
1)
Predic4ng
drug
response
2)
Future
approaches:
3)
Standardized
from
cancer
cell
lines
network-‐based
predictors
workflows
for
data
and
mul4-‐task
learning
management,
Cancer
cell
line
versioning
and
encyclopedia
method
comparison
Molecular characterization
Network
/
pathway
(1,000 cell lines) prior
informa4on
Currently
mRNA
copy number
somatic mutations (36
cancer-related genes)
In progress
targeted exon sequencing Vaske,
et
al.
epigenetics
microRNA TCGA
/ICGC
lncRNA Transfer
Molecular characterization
learning
(50 tumor types)
phospho-tyrosine kinase
metabolites
Viability screens (500 cell genomics
lines, 24 compounds)
transcriptomics
Small molecule screen epigenetics
Predic4ve
Clinical data
model
Vaske,
et
al.
61. 1) Data
management
APIs
to
load
standaridzed
objects,
e.g.
R
ExpressionSets
(MaD
Furia):
ccleFeatureData
<-‐
getEnQty(ccleFeatureDataId)
ccleResponseData
<-‐
getEnQty(ccleResponseDataId)
2)
tAutomated,
standardized
workflows
for
cura4on
and
QC
of
large-‐scale
datasets
(-‐
getEnQty(tcgaFeatureDataId)
cgaFeatureData
< Brig
Mecham).
tcgaResponseData
<-‐
getEnQty(tcgaResponseDataId)
A. TCGA:
Automated
cloud-‐based
processing.
B. GEO
/
Array
Expression:
NormalizaQon
workflows,
curaQon
of
phenotype
using
standard
ontologies.
C. AddiQonal
studies
with
geneQc
and
phenotypic
data
in
Sage
repository
(e.g.
CCLE
and
Sanger
cell
line
datasets)
Observed Data!=! Systematic Variation! +! Random Variation!
=! +! +!
3) Pluggable
API
to
implement
predic4ve
modeling
algorithms.
Normalization: Remove the influence of
adjustment variables on data...!
A) Support
for
all
commonly
used
machine
learning
methods
4) Sta4s4cal
performance
assessment
ew
methods)
(for
automated
benchmarking
against
n across
models.
B) Pluggable
custom
=! ethods
as
R
classes
implemenQng
m
customTrain()
and
customPredict()
methods.
+!
custom
model
1
be
arbitrarily
complex
(e.g.
pathway
and
other
A) Can
custom
model
2
custom
model
N
priors)
5) Output
of
candidate
biomarkers
and
feature
B) Support
for
parallelizaQon
in
for
each
loops.
evalua4on
(e.g.
GSEA,
pathway
analysis)
custom
model
1
custom
model
2
custom
model
N
6)
Experimental
follow-‐up
on
top
predic4ons
(TBD)
E.g.
for
cell
lines:
medium
throughput
suppressor
/
enhancer
screens
of
drug
sensiQvity
for
knockdown
/
overexpression
of
predicted
biomarkers.
67. Sage
Congress
Project
April
20
2012
RealNames
Parkinson’s
Project
RevisiQng
Breast
Cancer
Prognosis
Fanconi’s
Anemia
(Responders
CompeQQons-‐
IBM-‐DREAM)