There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.
1. Deciphering the regulatory
code in the genome
PhD completion seminar
Denis C. Bauer
Institute for Molecular Bioscience
The University of Queensland,
Australia
By yankodesign by linh.ngân
2. Research Aim
Thermodynamic model
Develop a method that translates the
regulatory message in the DNA of when and
how strong a gene is expressed.
AAGAAGGTTTTAGTTTAGCC Express gene with
CACCGTAGGTACCTGAAGAA
GAAGGTTTTAGTTTAGCCCA 70% capacity when it
CCGTAGGTACCTGAAG is hot, Thanks!
3. Why understanding transcriptional
regulation is important?
• Insight in the biology of gene pathways.
• Search for regulatory regions with specific function.
• “Re-programming” of genes has therapeutic
potential.
A transcription
gene
promoter
DNA
Broken regulatory Design and insert a new
element regulatory element
5. Background : Enhancer
• Genes can have independent “switches” (Enhancer)
beyond the core promoter, which can start the
transcription of the target gene under different
conditions.
transcription
gene
promoter
enhancer regions
6. Background: Enhancer
• Transcription is regulated by the binding of activator
and repressor TFs to an enhancer region.
enhancer
binding site map
Active
TF 8 Activators transcription
Concentration
2 Repressors
7. Background: Repression
• Transcriptional regulation is also dependent on the
interplay between activators and repressors, i.e.
where they bind relative to each other.
Repressor range
binding site map
enhancer
9. Background: Even-skipped gene (eve)
Drosophila melanogaster 1
Embryo stained for eve 2
Function representation 3
1 hLp://insects.eugenes.org/
2 Small et al.
3 hLp://bioinform.geneJka.ru
10. Background: Regulation of eve
MSE MSE eve MSE MSE MSE
Late1 3+7 2 P late2 4+6 1 5
lacZ
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the
Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
11. Hypothesis
TF Bindin
ns Genome
conce ntraJo g site
map re,
a rchitectu
RNA,
n,
m ethylaJo
…
predicts gene activation
12. Research Goals
• Optimize Thermodynamic models
efficiently.
• Analyze robustness of these
models.
• Explore the regulation of a
particular gene.
• Examine how the regulatory program evolves.
• Extend current thermodynamic model.
Cooperphoto/CORBIS
13. Model definition
Site occupancy (Hill function)
Kt · K(s, t) · [t]
p(s, t) =
1 + Kt · K(s, t) · [t] Free parameters
TF PARAMS
Total activation
K Binding affinity
W (S, T ) = Ets p(s, ts ) 1 − Ets · p(s , ts ) · d(s, s )
s∈S A s ∈S R
E Effectiveness
quenching of the activator
activator contribution GENERAL PARAMS
Transcription rate (Arrhenius function)
R0 Max. transcription
R exp W (S, T ) − G0 iff W < G0 rate
0
R(S, T ) =
R0 otherwise,
G0 Energy barrier
ts ts
Buena Vista Pictures
s s
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the
Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
14. Training the model
200
100
50
0
< [TF ], [TF ], [TF ], [TF ] >
0 20 40 60 80 100
1 2 3 4
TF Binding TF Concentration
Thermodynamic
Model
predicted Adjust model
expression and parameters to
150
100
compare it to improve fit
50
target
0
40 50 60 70 80 90
15. Optimization methods
• Two optimization paradigms
– Simulated Annealing
• LAM schedule (Reinitz et al. 2003)
• Geometric cooling
– Gradient descent
• Three GD variants approximating the objective function, which
was not continuously differentiable.
• Judged on accuracy achieved in the given time
– Drosophila MSE2 data with 400 data points and 7 TF
(16 free parameters).
16. Optimization
Simulated Annealing Gradient Descent
1.00
20
20
SA LAM
0.99
SA geom
0.99
15
15
RMS error
0.98
RMS error
CC
CC
10
10
0.97
0.97
SA_geom
5
5
0.96
GD_softmax
SA LAM
GD_nomax
SA geom
0.95
GD_max
0.95
0
0
1 2 5 10 50 200 1 2 1 5
2 105 20
10 50 100
50 200200500
time [minutes]
time [minutes] time [minutes]
Suggests: many local minima.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal
regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
17. If gradient descent gets
stuck in local minima all
the Jme, how does the
opJmizaJon landscape
look like ?
18. Landscape analysis
• Synthetic data based on real MSE2 data
– global minimum and solution (parameter values) are
known.
– Measuring distance of the optimization solution to the
starting position and the known solution.
– Measuring error reduction at the
solution compared to the
starting position.
19. Landscape analysis
Experiment Ini$al distance to Final distance to Error Red.
solu4on (mean) solu4on (mean)
(mean)
1% perturbed 3.4·10−4 2.8·10−4 88%
random 0.1 0.11 97%
Conclusion:
many local
minima.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal
regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
20. Does the model over-fit ?
• Cross-validation (5-fold)
Experiment Mean RMS error Mean CC
(SE) (SE)
training 13.39 (0.004) 0.92 (4.8 · 10−5 )
tesJng 14.04 (0.005) 0.91 (5.7 · 10−5 )
• Redundancy reduction
– Not enough data to begin with
21. Summary: Optimization & Analysis
• The objective function is
ill-posed.
– It has a plethora of local
minima.
– It might have many
global minima.
• Hence SA is the
method of choice.
• There might be a
tendency to over-fit the
data.
hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html
hLp://images.nciku.com/
22. Research Goals
• Optimize Thermodynamic models
efficiently
• Analyze robustness of these
models
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
Cooperphoto/CORBIS
23. Regulation and Evolution of eve
• Mechanism for regulating eve is
conserved:
– Stripe 2 elements from other
Drosophila species activate
eve in D. mel. correctly.
– Despite the substantial
difference in the
regulatory DNA
sequence.
hLp://www.bio.ilstu.edu/Edwards/
Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila
despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106
24. Evaluate Evolution of MSE2
• Test if the model can identify the MSE2 in these
other species.
• Test if the model correctly predicts the
transcriptional output of the homologous MSE2s.
25. Searching for MSE2
• Apply a model trained on D. mel. MSE2 to the TFBS-map
from sequential windows to find the MSE2 in other
species
MSE2 promoter
eve
Other species
150
100
50
0
40 50 60 70 80 90
150
RMS error
100
50
0
40 50 60 70 80 90
< 23 27 43 … 13 …
>
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules
and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
26. Searching for MSE2: Result
• Correctly identified the MSE2 in 6/8 species
40
D. melanogaster
30
20
RMS error
10
40
D.pseudoobscura
30
20
10
rms error
Genomic locaJon
40
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules
30
rimshawi
and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
20
27. Predicting the output in other species
• Apply a model trained on D. mel. MSE2 to the MSE2s
in other species
D. melanogaster
15
150
Target
10
D. melanogaster
Log odds score (bits)
relative RNA concentration
5
D. pseudoobscura
0
D. ananassae
!5
100
D. mojavensis
!10
!15
0 500 1000 1500
D. mojavensis
rel. genomic position
50
bicoid kruppel giant hunchback
knirps caudal tailless
0
40 50 60 70 80 90
A!P position (%)
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules
and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
28. Summary Application
• Model fits the data
qualitatively.
• Predictions are biologically
meaningful.
• However, there is room for
improvement.
29. Research Goals
• Optimize Thermodynamic models
efficiently
• Analyze robustness of these
models
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
Cooperphoto/CORBIS
30. One role fits them all?
• Dual function is proposed for some of the regulatory
TFs.
– E.g. TF Hunchback (Hb) might be an activator when
regulating stripe2 and repressor for stripe3.
Late1 3+7 2 P late2 4+6 1 5
Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the
Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906
Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of
Drosophila. PLoS Biol, 2004, 2, E271
31. Determine the regulatory role of TFs
• Different data set: 44 CRMs important for D. mel.
development but same set of TFs.
• Determine the best role for each TF in each of the
CRMs
– Brute Force: train a model for all TF role-combinations on
each of the 44 CRMs.
– Record the correlation achieved.
– Identify TFs that have dual-function.
Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes
Drosophila segmentaJon. Nature, 2008, 451, 535‐540
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by
SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed
for publicaJon, 2009
32. TFs with dual role
Bcd Cad Hb Tll Gt Kr Kni TorRE
Det. roles s + s ‐ s s ‐ s
Literature + + s ‐ (s) s ‐ NA
(consensus)
“s”: dual-functioning, “+”: activator, “-”: repressor.
• E.g. Hb
– Activator for 17 CRMs
– Repressor for 27 CRMs
Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster.
PLoS Comput Biol, 2006, 2, e51
Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of
Drosophila. PLoS Biol, 2004, 2, E271
33. Improvement with dual function
kr_CD1_ru hb_anterior_actv
1.0
1.0
1.0
target
previous roles
HbDual Experiment number of mean CC
KrDual free (SE)
0.8
0.8
0.8
HbKrDual
best parameters
Previous 18 0.27 (0.008)
0.6
0.6
0.6
mRNA
mRNA
mRNA
roles
HbDual 19 0.35 (0.009)
0.4
0.4
0.4
KrDual 19 0.37 (0.007)
0.2
0.2
0.2
HbKrDual 20 0.38 (0.007)
0.0
0.0
0.0
0 20 40 60 80 100 0 20 40 60 80 100
AP AP
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by
run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed
eve_37ext_ru
for publicaJon, 2009
.0
.0
.0
34. Marker motifs for dual function
• Running MEME on the protein sequence of dual-
functioning TFs to find short motifs (<6aa) present
in all of them.
CI KE
4 4
Q
3 3
K D ID
bits
bits
2
G 2
1
0
L E
Y Q
1
0
L
V
1
2
3
4
1
2
3
4
MEME (no SSC) 15.07.09 12:07 MEME (no SSC) 15.07.09 12:07
SUMOyla(on
mo(f
35. SUMOylation
• Small Ubiquitin-related Modifier a SUMO
protease
SU
small protein covalently attached ATP
to target-proteins. SU
SUMO
• Involved in many pathways/ SU
pathway
mechanisms E1 activating
enzyme
– Compartmentisation target protein
+ E3 ligasis
– Transcriptional regulation SU
• Can reverse the function of a TF e.g. E2 conjugating
enzyme
Ikaros (the human homologue of Kr)
• SUMO (Smt3) is present in D. mel during development
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in
developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009,
in submission
del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005,
25, 2688‐2697
36. Conclusion
• Thermodynamic models can be best optimized using SA but
over-fitting is an issue to keep in mind.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
• Non-the-less, they are applicable for
– examining the mechanisms of transcriptional regulation,
– explore the evolution of a particular regulatory mechanism
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
• Model prediction improves when dual-function is allowed.
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila
melanogaster submiLed for publicaJon, 2009
– SUMOylation seems to be a good candidate for the biological
mechanism of role-change.
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster
NeurocompuJng, 2009, in submission
37. Acknowledgments
• IMB • Funding
– Timothy Bailey (supervisor) – Institute for Molecular
– Mikael Bodén (supervisor) Bioscience, The University of
– Sean Grimmond (thesis committee)
Queensland
– Nick Hamilton (thesis committee)
– Australian Research Council
– Fabian Buske
Centre of Excellence in
– Stefan Maetschke
Bioinformatics
– National Institutes of Health
• Stony Brook University
– John Reinitz – UQ International Research
Tuition Award
Framework for modeling, visualizing, and predicJng the
regulaJon of the transcripJon rate of a target gene
www.bioinforma(cs.org.au/stream
38. www.bioinforma(cs.org.au/stream
• Framework for modeling, visualizing,
and predicting the regulation of the
transcription rate of a target gene.
• Publicly available
• Modular: New functions can be
plugged in
Many functions
Command line
Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for
transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.