The document discusses exploring disease networks and how science is performed through networked approaches. It describes how understanding disease requires integrating DNA, RNA, protein and molecular networks. It highlights the EGFR pathway example to show biomarkers can predict treatment response complexity. CETP inhibition example shows causal relationships are not always correlative. Networked approaches are needed to generate, analyze and support new models through data sharing to help understand disease mechanisms and save costs. Synapse is proposed as a platform to enable open sharing of clinical and genomic data as well as model building, similar to how software development occurs through tools like GitHub.
3. Oncogenes only make good targets in particular molecular
contexts : EGFR story
ERBB2
• EGFR
Pathway
commonly
mutated/acFvated
in
Cancer
EGFRi EGFR • 30%
of
all
epithelial
cancers
BCR/ABL
• Blocking
Abs
approved
for
treatment
of
metastaFc
colon
cancer
KRAS NRAS
• Subsequently
found
that
RASMUT
tumors
don’t
respond
–
“NegaFve
PredicFve
Biomarker”
BRAF
• However
sFll
EGFR+
/
RASWT
paFents
who
don’t
MEK1/2 respond?
–
need
“PosiFve
PredicFve
Biomarker”
• And
in
Lung
Cancer
not
clear
that
RASMUT
status
is
Proliferation,
Survival useful
biomarker
PredicFng
treatment
response
to
known
oncogenes
is
complex
and
requires
detailed
understanding
of
how
different
geneFc
backgrounds
funcFon
4. Causal Relationships ≠ Correlative Relationships? : CETPi story
• Epidemiological Data provides strong
support for independent association of low
LDL and high HDL with reduced incidence
of heart disease
• Statins reduce LDL and reduce incidence
of CVD deaths establishing causal
relationship
• CETP inhibition raises HDL – Does this
have positive clinical benefit?
• Torcetrapib (Pfizer) - $800M drug failed Ph3 (2006): a) Lack of efficacy; b) Increased mortality (off target?)
• Dalcetrapib (Roche) – development halted in Ph3 (May 2012) for lack of efficacy (no increase in mortality)
• Anacetrapib (Merck) / Evacetrapib (Lilly) – development ongoing. Hoped that they are better inhibitors and
this will lead to clinical benefit. Will cost $1Billion+ to find out
Can
we
save
billions
of
dollars
by
generaFng
and
sharing
datasets
that
let
us
be]er
understand
causal
relaFonships?
Is
there
a
common
framework
for
tesFng
clinical
hypotheses
(ARCH2POCM)?
5. what will it take to understand disease?
DNA
RNA
PROTEIN
MOVING
BEYOND
ALTERED
COMPONENT
LISTS
8. Preliminary Probabalistic Models- Rosetta
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
9. Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ...”
Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
10. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
11. Fundamentally
Biological
Science
hasn’t
changed
because
of
the
‘Omics
RevoluFon……
…..it
is
about
the
process
of
linking
a
system
to
a
hypothesis
to
some
data
to
some
analyses
Biological Data Analysis
System
But
the
way
we
do
it
has
changed…………………………………………
12. Driven
by
molecular
technologies
we
have
become
more
data
intensive
leading
to
more
specializaFon:
data
generators
(centralized
cores),
data
analyzers
(bioinformaFcians),
validators
(experimentalists:
lab
&
clinical)
This
is
reflected
in
the
tendency
for
more
mulF
lab
consorFum
style
grants
in
which
the
data
generators,
analyzers,
validators
may
be
different
labs.
Single Lab Model Data
• R01 Funding
• Hypothesis->data->analysis->paper
• Small-scale data / analysis
• Reproducible? Biological Analysis
System
Multiple Lab Model
Data
• P01 Funding
• Hypothesis->data->analysis->paper
• Medium-scale data / analysis
• Data Generators/Analysts/Validators maybe
different groups Biological Analysis
• Reproducible? System
13. Iterative Networked Approaches
To Generating Analyzing and Supporting New Models
Data
Biological
Analysis
System
Uncouple the automatic linkage between the
data generators, analyzers, and validators
14. Networked Approaches
BioMedicine Information Commons
Patients/
Citizens
Data
Generators
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYZES/
MODELS
Clinicians
SYNAPSE
Experimentalists
15. Networked Approaches 2
1
REWARDS
USABLE
RECOGNITION
DATA
BioMedical Information Commons
Patients/
Citizens
Data
Generators
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
5
RAW
DATA
PRIVACY
BARRIERS
ANALYZES/
MODELS 3
GOVERNANCE
Clinicians
4
HOW
TO
SYNAPSE
Experimentalists
DISTRIBUTE
TASKS
16. Barriers to Engaging Networked Approaches
to a BioMedicine Information Commons
1
USABLE
DATA
4
SYNAPSE
HOW
TO
DISTRIBUTE
TASKS
COLLABORATIVE
2
CHALLENGES
REWARDS
RECOGNITION
SYNAPSE
5
PRIVACY
BARRIERS
PORTABLE
LEGAL
CONSENT
3
RULES
GOVERNANCE
THE
FEDERATION
17. Open and Networked Approaches:Democratization of Science
1
USABLE
DATA
SYNAPSE
2
REWARDS
RECOGNITION
SYNAPSE
18. Two approaches to building common
scientific and technical knowledge
Every code change versioned
Every issue tracked
Text summary of the completed project Every project the starting point for new work
Assembled after the fact All evolving and accessible in real time
Social Coding
19. Synapse is GitHub for Biomedical Data
Every code change versioned
Every issue tracked
Data and code versioned Every project the starting point for new work
Analysis history captured in real time All evolving and accessible in real time
Work anywhere, and share the results with anyone Social Coding
Social Science
20. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
26. Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
27. Public or Private Projects
Find Public Data
Use Existing Tools Publish Your Work
28. my other computer is the cloud… let me hand it to you…
pilot advisors!
so with a click from your or figures...
clearScience links the
browser you can push
components of a ‘big
code into a virtual machine
science’ project to a cloud or entire compute
computing environment...
environments...
or data...
conveniently pre-populated
with data, code, and the
library and version
or models...
dependencies
30. • Automated
workflows
for
curaFon,
QC,
and
sharing
of
1%/2* 53,'6%(* !7"(%,2/"* large-‐scale
datasets.
-./#"++0%(* (3&4"#*
• All
of
TCGA,
GEO,
and
user-‐submi]ed
data
processed
with
standard
normalizaFon
methods.
1%/2* 53,'6%(* !7"(%,2/"* • Searchable
TCGA
data:
-./#"++0%(* (3&4"#* • 23
cancers
• 11
data
plaoorms
• Standardized
meta-‐data
ontologies
-./#"++0%(* -./#"++0%(*
!7"(%,2/"* !7"(%,2/"*
1%/2* 1%/2*
(3&4"#* (3&4"#*
53,'6%(* 53,'6%(*
!#"80)69"*&%8":*
;"("#'6%(*
!"#$%#&'()"*
'++"++&"(,*
31. 1%/2* 53,'6%(* !7"(%,2/"* • Data
accessible
at
mulFple
levels
of
aggregaFon.
-./#"++0%(* (3&4"#*
• Links
to
upstream
and
downstream
processing
of
data.
1%/2* 53,'6%(* !7"(%,2/"*
-./#"++0%(* (3&4"#* • Displayed
is
TCGA
Glioblastoma
data
normalized
for
each
plaoorm
across
batches.
-./#"++0%(* -./#"++0%(*
!7"(%,2/"* !7"(%,2/"*
1%/2* 1%/2*
(3&4"#* (3&4"#*
53,'6%(* 53,'6%(*
!#"80)69"*&%8":*
;"("#'6%(*
!"#$%#&'()"*
'++"++&"(,*
32. 1%/2* 53,'6%(* • Data
accessible
through
programmaFc
!7"(%,2/"*
-./#"++0%(* (3&4"#*
environments
such
as
R.
• Standardized
formats
allow
reuse
of
analysis
1%/2* 53,'6%(* !7"(%,2/"*
-./#"++0%(* (3&4"#* pipelines
on
all
processed
datasets.
• TCGA,
GEO,
user-‐submi]ed
data.
-./#"++0%(* -./#"++0%(*
!7"(%,2/"* !7"(%,2/"*
1%/2* 1%/2*
(3&4"#* (3&4"#*
53,'6%(* 53,'6%(*
!#"80)69"*&%8":*
;"("#'6%(*
!"#$%#&'()"*
'++"++&"(,*
33. 1%/2* 53,'6%(* !7"(%,2/"* • Comparison
of
many
modeling
approaches
applied
-./#"++0%(* (3&4"#*
to
the
same
data.
• Models
transparently
shared
and
reusable
through
-./#"++0%(*
1%/2* 53,'6%(* !7"(%,2/"* Synapse.
(3&4"#*
• Displayed
is
comparison
of
6
modeling
approaches
to
predict
sensiFvity
to
130
drugs.
• Extending
pipeline
to
evaluate
predicFon
of
-./#"++0%(* -./#"++0%(*
!7"(%,2/"* !7"(%,2/"* TCGA
phenotypes.
1%/2* 1%/2*
(3&4"#* (3&4"#* • HosFng
of
collaboraFve
compeFFons
to
compare
53,'6%(* 53,'6%(* models
from
many
groups.
1--'&2-3$4567$
!#"80)69"*&%8":*
*&+%,-./0$
;"("#'6%(*
!"#$%#&'()"*
'++"++&"(,*
!"#$%&'()$
35. Pipeline
Strategy
A
B
C
Divide
and
Conquer
Strategy
D
A
B
C
Parallel/IteraFve
Strategy
A
B
C
36. sage federation:
model of biological age
Faster Aging
Predicted
Age
(liver
expression)
Slower Aging
Clinical Association
- Gender
- BMI
- Disease
Age Differential Genotype Association
Gene Pathway Expression
Chronological
Age
(years)
37. REDEFINING HOW WE WORK TOGETHER:
Sage/DREAM Breast Cancer Prognosis Challenge
4
HOW
TO
COLLABORATIVE
DISTRIBUTE
CHALLENGES
TASKS
38. What
is the problem?
Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients
Current incentives reward those who
silo information and work in closed
systems 38
39. The Solution: Competitions to crowd-source research
in biology and other fields
Why competitions?
• Objective assessments
• Acceleration of progress
• Transparency
• Reproducibility
• Extensible, reusable models
Competitions in biomedical research
• CASP (protein structure)
• Fold it / EteRNA (protein / RNA structure)
• CAGI (genome annotation)
• Assemblethon / alignathon (genome assembly / alignment)
• SBV Improver (industrial methodology benchmarking)
• DREAM (co-organizer of Sage/DREAM competition)
Generic competition platforms
• Kaggle, Innocentive, MLComp
39
40. The Sage/DREAM breast cancer prognosis
challenge
Goal: Challenge to assess the accuracy of computational models designed to
predict breast cancer survival using patient clinical and genomic data
Why this is unique:
This Sage/DREAM Challenge is a pre-collated cohort: 2000 breast cancer samples
from the Metabric cohort
Accessible to all: A cloud-based common compute architecture is being made
available by Google to support the computational models needed to develop and test
challenge models
New Rigor:
• Contestants will evaluate their models on a validation data set composed of newly generated
data (provided by Dr. Anne-Lise Borreson Dale)
• Contestants must demonstrate their models can be reproduced by others
New incentives: leaderboard to energize participants, Science Translational Medicine
publication for winning team
Breast cancer patients, funders and researchers can track this Challenge on BRIDGE,
an open source online community being built by Sage and Ashoka Changemakers and
affiliated with this Challenge
40
41. Sage/DREAM Challenge: Details and Timing
Phase
1: Apr thru end-Sep 2012 Phase
2:
Oct 1 thru Nov 12, 2012
Training data: 2,000 breast cancer Evaluation of models in novel
samples from METABRIC cohort dataset.
• Gene expression
• Copy number
Validation data: ~500 fresh frozen
• Clinical covariates tumors from Norway group with:
• 10 year survival • Clinical covariates
• 10 year survival
Supporting data: Other Sage-
curated breast cancer datasets
Gene expression and copy number
• >1,000 samples from GEO data to be generated for model
• ~800 samples from TCGA evaluation
• ~500 additional samples from • Sent to Cancer Research UK to
Norway group generate data at same facility as
• Curated and available on METABRIC
Synapse, Sage’s compute • Models built on training data
platform evaluated on newly generated
data
Data released in phases on
Synapse from now through end- Winners announced at November
September 12 DREAM conference
Will evaluate accuracy of models
built on METABRIC data to predict
survival in:
• Held out samples from
METABRIC 41
• Other datasets
42. Summary
Transparency,
Valida;on
in
novel
reproducibility
-./#"++0%(*
1%/2*
(3&4"#* 53,'6%(* !7"(%,2/"*
dataset
1%/2* 53,'6%(* !7"(%,2/"*
-./#"++0%(* (3&4"#*
-./#"++0%(* -./#"++0%(*
!7"(%,2/"* !7"(%,2/"*
1%/2* 1%/2*
(3&4"#* (3&4"#*
53,'6%(* 53,'6%(*
!#"80)69"*&%8":*
;"("#'6%(*
!"#$%#&'()"*
'++"++&"(,*
Publica;on
in
Science
Dona;on
of
Google-‐
Transla;onal
Medicine
scale
compute
space.
For
the
goal
of
promo;ng
democra;za;on
of
medicine…
Registra;on
star;ng
NOW…
42
sign
up
at
synapse.sagebase.org
43. Presentation outline
1)
Predic;ng
drug
2)
Predic;ng
clinical
3)
Workflows
for
data
response
from
cancer
cancer
phenotypes
management,
versioning
and
cell
lines
method
comparison
Cancer
cell
line
Primary
tumor
datasets
encyclopedia
(TCGA,
METABRIC)
1%/2* 53,'6%(* !7"(%,2/"*
-./#"++0%(* (3&4"#*
Molecular Molecular
characterization characterization 1%/2*
-./#"++0%(* 53,'6%(* !7"(%,2/"*
• 1,000 cell lines genomics (3&4"#*
transcriptomics
mRNA epigenetics
-./#"++0%(* -./#"++0%(*
copy number Predic;ve
Clinical data 1%/2*
!7"(%,2/"*
1%/2*
!7"(%,2/"*
model
(3&4"#* (3&4"#*
Sequencing (e.g. survival time) 53,'6%(* 53,'6%(*
(1,600 genes)
4)
Network-‐based
predictors
and
mul;-‐
Viability screens task
learning
!#"80)69"*&%8":*
;"("#'6%(*
• 500 cell lines
• 24 compounds !"#$%#&'()"*
'++"++&"(,*
44. Developing predictive models of genotype-
specific sensitivity to compound treatment
Gene;c
Feature
Matrix
Expression,
copy
number,
somaFc
mutaFons,
etc.
Predic;ve
Features
(biomarkers)
Cancer
samples
with
varying
degrees
of
response
to
therapy
Sensi;ve
Refractory
(e.g.
EC50)
44
45. Our approach identifies mutations in genes upstream of
MEK as top predictors of sensitivity to MEK inhibition
#9
Mut
KRAS
#3
Mut
BRAF
!"#$% &"#$% #1
Mut
NRAS
PD-‐0325901
'"#(%
#312
Mut
NRAS
)*!+,-% #./0-11%
2/345-674+%
#9
Mut
BRAF
45
PD-‐0325901
47. Predicted biomarkers supported by literature evidence
Predic;on
Literature
evidence
Model
/
Significance
HDAC
inhibitors
are
Supported
in
current
Typical
pharma:
>10
phase
2
effec;ve
in
clinical
trials
clinical
trials
in
solid
tumors
haematopoie;c
tumors
@
$millions
per
trial.
solid
haematopoietic ”Responses
with
single
agent
HDACi
have
been
predominantly
observed
in
advanced
LBH589 (HDACi)
hematologic
malignancies
including
T-‐cell
lymphoma,
Hodgkin
lymphoma,
and
myeloid
malignancies."
NQO1
over-‐expression
NQO1
metabolizes
17-‐AAG
to
predicts
17-‐AAG
stable
intermediary
with
32-‐fold
sensi;vity
increase
in
ac;vity.
!"#$%&'()%
)*+,,-%
MYC
amplifica;on
HSP70
inhibits
MYC-‐mediated
%&'())**+$
predicts
sensi;vity
to
apoptosis.
HSP70
inhibi;on.
!"#$
,-./*$
)*+,(-.)(
!"#$%%&&'(
48. Novel predictions are functionally validated
Predic;on
Valida;on
AHR
expression
predicts
sensi;vity
Func;onally
validated
by
AHR
knockdown
to
MEK
inhibitors
in
NRAS
mutant
cell
lines
Legend
AHR
shRNA
Wei
G.*,
Margolin
A.A.*,
et
al,
Cancer
Cell
Control
shRNA
BCL-‐xL
expression
predicts
Func;onally
validated
by
:
sensi;vity
to
several
chemotherapeu;cs
BCL-‐xL
knockdown
BCL-‐xL
inhibitor
drug
synergy
!"#$%&'#()* +',-&$#"#(&'* ./%0* 0&1&"23#/#4* .4#5&67/#4* 86)94)* :2"&67/#4*
!"#$%#&
=><"*
?!@*
'%()*++,-.&
/,5$,5)*
&
!"#"$%&'(')*
;<"*
+$',-".'/0*
1203)0* Mouse
models
Clinical
trials
4(-!*
5.67",'$'/".*
4)'("28(')*
9%$"28(')*
48
49. Open and Networked Approaches
5
PRIVACY
PORTABLE
LEGAL
CONSENT:
weconsent.us
BARRIERS
John
Wilbanks
51. The Current R&D Ecosystem Is In Need of a New
Approach to Drug Development
• $200B per year in biomedical and drug discovery R&D
• Only a handful of new medicines are approved each year
• Productivity in steady decline since 1950
• >90% of novel drugs entering clinical trials fail, and negative POC
information is not shared
• Significant pharma revenues going off patent in next 5 years
• >30,000 pharma employees laid off from downsizing in each of last four
years
• 90% of 2013 prescriptions will be for generic drugs
51
52. Issues With Drug Discovery
1. The greatest attrition is at clinical proof-of-concept – once
a “target” is linked to a disease in the clinic, the risk of
failure is far lower
2. Most novel targets are pursued by multiple companies in
parallel (and most fail at clinical POC)
3. The complete data from failed trials are rarely, if ever,
released to the public
52
54. SGC: Open Access Chemical Biology
a great success
• PPP:
-‐
GSK,
Pfizer,
NovarFs,
Lilly,
Abbo],
Takeda
-‐
Genome
Canada,
Ontario,
CIHR,
Wellcome
Trust
• Based
in
UniversiFes
of
Toronto
and
Oxford
• 200
scienFsts
• Academic
network
of
more
than
250
labs
• Generate
freely
available
reagents
(proteins,
assays,
structures,
inhibitors,
anFbodies)
for
novel,
human,
therapeuFcally
relevant
proteins
• Give
these
to
academic
collaborators
to
dissect
pathways
and
disease
networks,
and
thereby
discover
new
targets
for
drug
discovery
54
55. Some SGC Achievements
• Structural
impact
– SGC
contributed
~25%
of
global
output
of
human
structures
annually
– SGC
contributes
>40%
of
global
output
of
human
parasite
structures
annually
• High
quality
science
(some
publicaFons
from
2011)
Vedadi
et
al,
Nature
Chem
Biol,
in
press
(2011);
Evans
et
al,
Nature
Gene;cs
in
press
(2011);
Norman
et
al
Science
Transl
Med.
3(88):88mr1
(2011);
Kochan
G
et
al
PNAS
108:7745
(2011);
Clasquin
MF
et
al
Cell
145:969
(2011);
Colwill
et
al,
Nature
Methods
8:551
(2011);
Ceccarelli
et
al,
Cell
145:1075
(2011;
Strushkevich
et
al,
PNAS
108:10139
(2011);
Bian
et
al
EMBO
J
in
press
(2011)
Norman
et
al
Science
Trans.
Med.
3:76cm10
(2011);
Xu
et
al
Nature
Comm.
2:
art.
no.
227
(2011);
Edwards
et
al
Nature
470:163
(2011);
Fairman
et
al
Nature
Struct,
and
Mol.
Biol.
18:316
(2011);
Adams-‐Cioaba
et
al,
Nature
Comm.
2
(1)
(2011);
Carr
et
al
EMBO
J
30:317
(2011);
Deutsch
et
al
Cell
144:566
(2011);
Filippakopoulos
et
al
Cell,
in
press;
Nature
Chem.
Biol.
in
press,
Nature
in
press
55
56. Impact Of SGC’s Open Access JQ1 BET Probe
Paper published Dec 23 has already cited >60 times
Harvard spin off (15 M$ seed funding raised)
> 5 pharma have launched bromodomain programs
JQ1/SGCB01 has been distributed to >250 labs/companies
Already used by some to link Brd4 to new areas of science
Zuber et al : BRD4 as target in acute leukaemia Nature, 2011
Delmore et al: JQ1 suppresses myc in multiple myeloma Cell, 2011
Dawson et al: BRD4 in MLL (isoxazole inhibitor) Nature, 2011
Blobel et al: Novel Targets in AML Cancer Cell, 2011
Mertz et al : Myc dependent cancer PNAS, 2011
Zhao et al: Post mitotic transcriptional re-activation Nature Cell Biol., 2011
56
58. Drug
Discovery
Is
a
Lomery
Because:
Knowledge
about
clinical
disease
is
limiFng
-‐
paFents
are
heterogeneous
-‐
do
not
know
how
some
drugs
work
eg
paracetamol
-‐
different
doses
effecFve
in
different
paFents
-‐
efficacy
is
short
lived
-‐
poor
biomarkers…..
Too
many
targets/preclinical
assays
do
not
prioriFze
58
59. Other Problems With How We Do Drug
Discovery
• Same
targets,
in
parallel,
in
secret
• No
one
organisaFon
has
all
capabiliFes
• Early
IP
is
making
it
even
harder
(makes
process
slower,
harder
and
more
expensive)
59
60. Most Novel Targets Fail at Clinical POC
Hit/
Target HTS Probe/ LO Clinical
Tox./ Phase Phase
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery ID
ID
50% 10% 30% 30% 90+%
this is killing
our industry
…we can generate “safe” molecules, but they
are not developable in chosen patient group 60
61. This Failure Is Repeated, Many Times
Hit/
Target HTS Probe/ LO Clinical
Toxicology/ Phase Phase
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery Hit/ ID
Target ID Clinical
Probe/ Toxicology/ Phase Phase
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery Hit/ ID 30% 30% 90+%
Target ID Clinical
Probe/ Toxicology/ Phase Phase
ID/ Hit/ candidate
Target Lead Clinical Pharmacy I IIa/ b
Discovery Probe/ ID Toxicology/ Phase Phase
ID/ ID candidate 30% 30% 90+%
Lead Pharmacy I IIa/ b
Discovery Hit/ ID
Target ID Clinical
Probe/ Toxicology/
30% Phase
30% Phase
90+%
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery Hit/ ID
Target ID Clinical 30% 30% 90+%
Probe/ Toxicology/ Phase Phase
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery Hit/ ID 30% 30% 90+%
Target ID Clinical
Probe/ Toxicology/ Phase Phase
ID/ candidate
Lead Pharmacy I IIa/ b
Discovery ID
ID 30% 30% 90+%
50% 10% 30% 30% 90+%
…and outcomes are not shared 61
62. A Possible Soution:Arch2POCM
An Open Access Clinical Validation PPP
• PPP
to
clinically
validate
(Ph
IIa)
pioneer
targets
• Pharma,
public,
academia,
regulators
and
paFent
groups
are
acFve
parFcipants
• CulFvate
a
common
stream
of
knowledge
– Avoid
patents
– Place
all
data
into
the
public
domain
– Crowdsource
the
PPP’s
druglike
compounds
• In
–validated
targets
are
idenFfied
before
pharma
makes
a
substanFal
proprietary
investment
– Reduces
the
number
of
redundant
trials
on
bad
targets
– Reduces
safety
concerns
• Validated
targets
are
de-‐risked
for
pharma
investment
– Pharma
can
iniFate
proprietary
effort
when
risks
are
balanced
with
returns
– PPP
pharma
members
can
acquire
Arch2POCM
IND
for
validated
targets
and
benefit
from
shorter
development
Fmeline
and
data
exclusivity
for
sales
62
63. Arch2POCM: Scale and Scope
• Proposed Vertical Goal:
– Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for
Neuroscience/Schizophrenia/Autism.
– Both programs will have 8 drug discovery projects (targets)
– By Year 5, 30% of projects will have started Ph 1 and 20% will have completed
Ph Iia
– $200-250M over five years is projected as necessary to advance up to 8 drug
discovery projects within each of the two therapeutic programs
– By investing $1.6 M annually into one or both of Arch2POCM’s selected disease
areas, partnered pharmaceutical companies:
1. obtain a vote on Arch2POCM target selection
2. gain real time data access to Arch2POCM’s 16 drug discovery projects
3. have the strategic opportunity to expand their overall portfolio
• Proposed Horizontal Goal:
– Initiate 1-2 projects, (1-2 novel target mechanisms), as pilots to assess
Arch2POCM principles
– In either Oncology or Neuroscience
– Specific target mechanisms to be determined by funders’ interest
– Interested funders include pharma, public research foundations and venture
philanthropists
63
64. Epigenetics: Exciting Science and Also A New Area
For Drug Discovery
Lysine
DNA
Histone
Modification Write Read Erase
Acetyl HAT Bromo HDAC
Methyl HMT MBT DeMethyl
64
65. The Case For Epigenetics/Chromatin Biology
1. There are epigenetic oncology drugs on the market (HDACs)
2. A growing number of links to oncology, notably many genetic links (i.e.
fusion proteins, somatic mutations)
3. A pioneer area: More than 400 targets amenable to small molecule
intervention - most of which only recently shown to be “druggable”, and
only a few of which are under active investigation
4. Open access, early-stage science is developing quickly – significant
collaborative efforts (e.g. SGC, NIH) to generate proteins, structures,
assays and chemical starting points
65
66. The Current Epigenetics Universe
Domain Family Typical substrate class* Total
Targets
Histone Lysine Histone/Protein K/R(me)n/ (meCpG) 30
demethylase
Bromodomain Histone/Protein K(ac) 57
R Tudor domain Histone Kme2/3 - Rme2s 59
O
Chromodomain Histone/Protein K(me)3 34
Y
A MBT repeat Histone K(me)3 9
L
PHD finger Histone K(me)n 97
Acetyltransferase Histone/Protein K 17
Methyltransferase Histone/Protein K&R 60
PARP/ADPRT Histone/Protein R&E 17
MACRO Histone/Protein (p)-ADPribose 15
Histone deacetylases Histone/Protein KAc 11
395
Now known to be amenable to small molecule inhibition 66
68. What Are Bromodomains and How Do They
Function?
What Are Bromodomains:
• Small highly conserved protein recognition
domains (~110 residues)
• Bundle of four α-helices and two loops that form
a pocket with a conserved Asn residue
• 56 unique human bromodomains identified:
spread across 42 proteins
How Do They Function:
• Selectively bind to acetylated lysine residues
located on histones
• Histone/BRD complex leads to transcription and
gene expression
• Inhibition of BRD binding to acetylated histones
leads to gene silencing
68
71. Robust Assays Available
Peptide library screen using SPR Peptide array screens using dot blots
Histone peptide
Targets
We now have a suite of assays for bromodomains
• Filippakopoulos et al Cell. 2012 149(1):214-31.
71
72. A Series of Chemical Starting Points
CBP/PCAF
BET
72
73. Proof-of-concept
JQ1: A Selective Inhibitor for BETs
73
Panagis
Fillipakopoulos,
Jun
Qi,
Stefan
Knapp,
Jay
Bradner
74. NUT midline carcinoma (NMC) is a rare,
highly lethal cancer that occurs in children
and young adults.
NMCs uniformly present in the midline,
most commonly in the head, neck, or
mediastinum, as poorly differentiated
carcinomas
Rearrangement of the Nuclear protein in
testis (NUT) that creates a BRD4-NUT
fusion gene
Variant rearrangements, some involving
the BRD3 gene
It is unclear how common NUT NMC is diagnosed by fluorescence in
rearrangements are in squamous cell situ hybridization and NUT antibodies.
carcinomas due to lack of routine
diagnostic 74
75. JQ1 Inhibits NMC Tumour Growth
FDG-PET
4 days 50mg/kg IP 75
Jay Bradner/Andrew Kung, Harvard
76. Potential Year 1 Aims of an Arch2POCM Bromodomain
Program
1. Select two pre-clinical candidates: Leverage SGC’s existing open
access network of labs, compounds, assays and information to identify
two chemotypes for medicinal chemistry optimization
2. Develop a biomarker strategy for clinical development: opportunities for
surrogate endpoints and patient stratification
3. Implement crowdsourced research: manufacture and distribute
optimized pre-clinical candidates to academic and clinical researchers
76
77. Process For Arch2POCM Target Selection
Arch2POCM creates a disease area spreadsheet of relevant
information for pioneer targets such as:
1. Novelty: Target selection should focus on addressing fundamental
questions on biology and disease association
• No clinical precedent
• Exception: advance an existing asset into a new disease area
2. Targets should be tractable
• In vitro assay availability
• Cell-based assay availability
• Characterized protein (e.g. 3D structure; antibody, cell lines, mouse model)
• Availability of starting chemical matter
3. Evidence of genetic linkages
• Translocations, mutations, splicing alterations specifically linked to disease
• “Peripheral” genetic linkages:
• Gene expression profiles or GWAS data indicate correlation
– Implicated in pathway with clear genetic link (SLS, Networks)
4. Key research contacts (academic or industry)
77
78. Poten;al
Targets-‐
Bromodomain
Family
Evidence
that
this
target
plays
an
important
Maturity
of
the
Posi;ve
Data
showing
Mouse
knockout
model
(MGI)
role
in
tumors
(in
vitro,
in
vivo,
animal
program
evidence
of
a
failed
result
model
data)
the
of
the
compound
compound
for
playing
a
role
the
given
in
the
given
disease
disease
Expression
correlates
with
development
of
potent,
NA
NA
Homozygotes
for
a
null
allele
die
in
utero
before
SMARCA4
prostate
cancer
selecFve,
cell
implantaFon.
Embryos
heterozygous
for
this
null
BUT
SMARCA4
in
general
acts
as
tumor
acFve
allele
and
an
ENU-‐induced
allele
show
impaired
suppressor
and
is
necessary
for
genome
compound
definiFve
erythropoiesis,
anemia
and
lethality
stability;
targeted
knockdown
of
SMARCA4
idenFfied
during
organogenesis.
Heterozygotes
show
potenFates
lung
cancer
development;
cyanosis
and
cardiovascular
defects
and
are
pre-‐
disposed
to
breast
tumors
Gastric
cancer;
mutated
in
CLL;
depleFon
of
potent,
NA
NA
Mice
homozygous
for
a
targeted
mutaFon
in
this
SMARCA2A
BRM
causes
accelerated
progression
to
the
selecFve,
cell
gene
may
exhibit
inferFlity
and
a
slightly
increased
differenFaFon
phenotype
acFve
body
weight
in
some
geneFc
backgrounds.
BUT
targeted
deleFon
is
causaFve
for
the
compound
development
of
prostaFc
hyperplasia
in
mice
idenFfied
TranslocaFon
of
CBP
with
MOZ,
monocyFc
potent,
NA
NA
Homozygotes
for
null
or
altered
alleles
die
around
CBP
leukemia
zinc
finger
protein
cause
acute
selecFve,
cell
midgestaFon
with
defects
in
hemopoiesis,
blood
myeloid
leukemia
;
other
translocaFons
acFve
vessel
formaFon,
and
neural
tube
closure.
involve
MLL
(HRX);
Mutated
in
ALL
BUT
CBP
compound
Heterozygotes
may
exhibit
skeletal,
cardiac,
and
has
also
been
proposed
as
a
classical
tumor
idenFfied
hematopoieFc
defects,
retarded
growth,
and
suppressor
hematologic
tumors.
Correlated
with
survival
of
high-‐grade
Weak
hits
NA
NA
NA
ATAD2
osteosarcoma
paFents
a{er
chemo-‐therapy;
required
for
breast
cancer
cell
proliferaFon
;
differenFally
expressed
in
NSCLC
TranslocaFons
produce
BRD4-‐NUT
fusion
JQ1
JQ1
in
BRD-‐ NA
Homozygotes
for
a
gene-‐trap
null
mutaFon
die
BRD4
oncogene
causing
midline
carcinoma
NUT
fusion
soon
a{er
implantaFon.
Heterozygotes
exhibit
and
MLL
impaired
pre-‐
and
postnatal
growth,
head
malformaFons,
lack
of
subcutaneous
fat,
cataracts,
and
abnormal
liver
cells.
In
transgenic
mice,
consFtuFve
lymphoid
JQ1
JQ1
in
BRD-‐ NA
Mice
homozygous
for
a
null
mutaFon
display
BRD2
expression
of
Brd2
causes
a
malignancy
most
NUT
fusion
embryonic
lethality
during
organogenesis
with
similar
to
human
diffuse
large
B
cell
and
MLL
decreased
embryo
size,
decreased
cell
lymphoma
proliferaFon,
a
delay
in
the
cell
cycle,
and
increased
cell
death.
Heterozygous
mice
also
display
decreased
cell
proliferaFon.
79. Poten;al
Targets-‐
Demethylases
Evidence
that
this
target
plays
an
important
role
in
Maturity
of
Posi;ve
Data
showing
a
Mouse
model
(MGI)
tumors
(in
vitro,
in
vivo,
animal
model
data)
the
program
evidence
of
the
failed
result
of
compound
the
compound
playing
a
role
in
for
the
given
the
given
disease
disease
Upregulated
in
prostate
cancer;
expression
is
higher
potent,
NA;
inhibits
NA
Mice
homozygous
for
a
knock-‐out
allele
JMJD3
in
metastaFc
prostate
cancer
selecFve,
TNF-‐alpha
exhibit
perinatal
lethality
associated
with
BUT
JMJD3
contributes
to
the
acFvaFon
of
the
cell
acFve
producFon
in
thick
alveolar
septum
and
absences
of
air
INK4A-‐ARF
tumor
suppressor
locus
in
response
to
compound
macrophages
of
space
in
the
lungs.
Bone
marrow
chimera
oncogene
-‐
and
stress-‐induced
senescence.
idenFfied
RA
paFents
mice
derived
from
fetal
liver
cells
exhibit
impaired
eosinophil
recruitment
and
abnormal
response
to
helminth
infecFon.
High
levels
in
breast
cancer
cell
lines,
strong
No
progress
NA
NA
NA
JARID1B
expression
in
the
invasive
but
not
in
the
benign
components
of
primary
breast
carcinomas.
BUT
tumor
suppressor
in
melanoma
cells
80. Poten;al
Targets-‐
Histone
Methyltransferases
Evidence
that
this
target
plays
an
important
role
in
Maturity
of
the
Posi;ve
evidence
Data
showing
a
tumors
(in
vitro,
in
vivo,
animal
model
data)
program
of
the
compound
failed
result
of
the
playing
a
role
in
compound
for
the
the
given
disease
given
disease
Recent
data
indicates
that
SETD8
deregulates
PCNA
Weak
inhibitors
NA
NA
SETD8
expression
by
degradaFon
accelerated
by
methylaFon
at
idenFfied
(8
microM)
K248.
Expression
levels
of
SETD8
and
PCNA
upregulated
in
in
chemistry
cancer
cells.
Cancer
Research
May
2012
Takawa
et
al.
opFmizaFon.
EZH2
upregulated
in
cancer
cells.
Studies
on
mutants
potent,
selecFve,
cell
NA
NA
EZH2
indicates
an
interesFng
profile
where
both
wild-‐type
and
acFve
compound
mutant
(Y641F)
are
required
for
malignant
phenotype.
idenFfied.
Sneeringer
et
al.
PNAS
2012.
Compounds
idenFfied
in
GSK
patents
WO
2011/140324
and
140315
and
WO
2012/005805
and
075080.
MMSET,
WHSC1,
NSD2
is
overexpressed
in
cancer
cells.
No
hits—currently
NA
NA
MMSET
Hudlebusch
et
al.
Clinical
Cancer
Res
2011
screening
Daigle
et
al.
Cancer
Cell
2011
elegantly
show
that
potent
potent,
selecFve,
cell
Transgenic
mouse
DOT1L
DOT1L
inhibitors
kill
cells
containing
MLL
translocaFons
acFve
compound
model
tumors
and
do
not
kill
cell
not
containing
the
translocaFons
idenFfied.
shrunk
by
SC
dosing
of
inhibitor
81. Proposed Metrics For Measuring Arch2POCM Success
Use a therapeutic product profile (TPP) with stage-gates and defined milestones
to monitor project progression:
• Small molecule screening hit rate achieved
• SAR/In vitro testing
– Target EC50 achieved by at least XX compounds
– Selectivity target achieved by at least YY compounds
– Biological activity demonstrated for at least XX compounds in human tissue models (disease tissue, stem cells)
• Manufacturing and Quality
– Steady and cost-effective supply of lead compound achieved
– Stability of lead compound demonstrated (sufficient to support POCM testing)
– Lead compound formulation identified to support pre-clinical and clinical studies
– Lead compound demonstrates selected quality attributes (sufficient to support pre-clinical studies and distribution to the
crowd)
• Pre-clinical testing
– Lead compounds achieve pre-clinical safety
– Lead compound s surpass target TI
– Lead compounds demonstrate cross-reactivity sufficient to support pre-clinical tox testing
• Clinical
– Lead compounds demonstrate Ph I safety
– Lead compounds demonstrate Ph II POCM
• Data management
– IT database infrastructure populated with XX epigenetics investigators/grant application/publications
– Database QC and compliance defined and implemented (internal and external)
81
82. Program Activities Grid For Arch2POCM
Ac;vity
Arch2POCM
Loca;on/Inves;gator
(TBD)
Target
Structure
Compound
libraries
Assay
development
for
epigeneFc
screens
and
biomarkers
HTP
screens
for
epigeneFc
hits
Med
Chem
SAR
To
ID
Two
Suitable
Binding
Arch2POCM
Test
Compounds
Non-‐GLP
scaleup
of
Arch2POCM
Test
Compounds
and
associated
analyFcs
DistribuFon
of
Arch2POCM
Test
Compounds
PK,
PD,
ADME,
Tox
TesFng
GMP
Manufacturing
of
Arch2POCM
Test
Compounds
GMP
FormulaFon
GMP
Drug
Storage
and
DistribuFon
IND
PreparaFon
Support
Clinical
Assay
Development
and
QualificaFon
Ph
I-‐II
Clinical
Trials
Ph
I-‐II
Database
Management
and
CSR
ProducFon
82
83. DISCUSSION
• OpportuniFes
to
Review
Targets
• OpportuniFes
to
Discuss
Approach
• OpportuniFes
to
Consider
PotenFal
Lead
Groups
for
funding
using
this
Open
Approach
83
84. Networked Approaches 2
1
REWARDS
USABLE
RECOGNITION
DATA
BioMedical Information Commons
Patients/
Citizens
Data
Generators
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
5
RAW
DATA
PRIVACY
BARRIERS
ANALYZES/
MODELS 3
GOVERNANCE
Clinicians
4
HOW
TO
SYNAPSE
Experimentalists
DISTRIBUTE
TASKS