SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Deciphering the regulatory
                        code in the genome
                      PhD completion seminar
                      Denis C. Bauer

                      Institute for Molecular Bioscience
                      The University of Queensland,
                      Australia


By yankodesign                                             by linh.ngân 
Research Aim
 Thermodynamic model
Develop a method that translates the
 regulatory message in the DNA of when and
 how strong a gene is expressed.


   AAGAAGGTTTTAGTTTAGCC     Express gene with 
   CACCGTAGGTACCTGAAGAA
   GAAGGTTTTAGTTTAGCCCA    70% capacity when it 
   CCGTAGGTACCTGAAG           is hot, Thanks! 
Why understanding transcriptional
      regulation is important?
•  Insight in the biology of gene pathways.
•  Search for regulatory regions with specific function.
•  “Re-programming” of genes has therapeutic
   potential.


                                 A                                    transcription

                                                               gene
                                                    promoter

 DNA

    Broken regulatory    Design and insert a new 
        element            regulatory element 
What do we need to know 
for  building  a  model  able 
to translate the regulatory 
message ? 
Background : Enhancer
•  Genes can have independent “switches” (Enhancer)
   beyond the core promoter, which can start the
   transcription of the target gene under different
   conditions.
                                                   transcription


                                            gene
                                 promoter




           enhancer regions
Background: Enhancer
   •  Transcription is regulated by the binding of activator
      and repressor TFs to an enhancer region.

                     enhancer


binding site map



                   Active
     TF               8 Activators                   transcription
Concentration
                      2 Repressors
Background: Repression
   •  Transcriptional regulation is also dependent on the
      interplay between activators and repressors, i.e.
      where they bind relative to each other.
                               Repressor range




binding site map

                    enhancer
On  which  system  would 
we  test  the  model’s 
abiliJes ? 
Background: Even-skipped gene (eve)
                       Drosophila melanogaster 1




                       Embryo stained for eve 2




                       Function representation 3


                                   1 hLp://insects.eugenes.org/ 
                                   2 Small et al. 
                                   3 hLp://bioinform.geneJka.ru 
Background: Regulation of eve
                 MSE                    MSE                    eve                                    MSE                   MSE MSE
Late1            3+7                        2            P                       late2                     4+6                    1        5 




                                                                                                     lacZ 




                                              Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the 
                                              Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
Hypothesis


           TF            Bindin
                   ns                       Genome  
                                                      
      conce ntraJo             g site 
                            map                      re,
                                         a rchitectu
                                              RNA, 
                                                       n,
                                          m  ethylaJo
                                                 … 




predicts gene activation
Research Goals
•  Optimize Thermodynamic models
   efficiently.
•  Analyze robustness of these
   models.
•  Explore the regulation of a
     particular gene.
•  Examine how the regulatory program evolves.
•  Extend current thermodynamic model.


                                                 Cooperphoto/CORBIS 
Model definition
 Site occupancy (Hill function)
                Kt · K(s, t) · [t]
  p(s, t) =
              1 + Kt · K(s, t) · [t]                                                                      Free parameters
                                                                                                    TF PARAMS
 Total activation
                                                                                                     K           Binding affinity
W (S, T ) =            Ets p(s, ts )            1 − Ets · p(s , ts ) · d(s, s )
               s∈S A                   s ∈S R
                                                                                                      E           Effectiveness
                                                quenching of the activator
                                         activator contribution                                     GENERAL PARAMS

 Transcription rate (Arrhenius function)
                                                                                                    R0 Max. transcription
            R           exp W (S, T ) − G0               iff W < G0                                                             rate
               0
R(S, T ) =
           
             R0                                            otherwise,
                                                                                                     G0          Energy barrier  


                       ts                                    ts
                                                                                                                Buena Vista Pictures 
                       s                                    s
                                            Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the 
                                            Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
Training the model




                                            200
                                            100
                                            50
                                            0
                                                           < [TF ], [TF ], [TF ], [TF ] >
                                                       0         20            40        60       80       100



                                                                      1             2         3        4

                  TF Binding                                TF Concentration



                                    Thermodynamic
                                        Model

      predicted                                                                         Adjust model
expression and                                                                          parameters to
                    150
                    100




  compare it to                                                                         improve fit
                    50




         target
                    0




                          40   50      60         70        80            90
Optimization methods
•  Two optimization paradigms
   –  Simulated Annealing
      •  LAM schedule (Reinitz et al. 2003)
      •  Geometric cooling
   –  Gradient descent
      •  Three GD variants approximating the objective function, which
         was not continuously differentiable.
•  Judged on accuracy achieved in the given time
   –  Drosophila MSE2 data with 400 data points and 7 TF
      (16 free parameters).
Optimization
            Simulated Annealing                                            Gradient Descent




                                                           1.00


                                                                       20




                                                                                                                                        20
                                                                                                                SA LAM
     0.99




                                                                                                                SA geom




                                                           0.99


                                                                       15




                                                                                                                                        15
                                                           RMS error
                                                                0.98




                                                                                                                            RMS error
CC




                                                      CC




                                                                       10




                                                                                                                                        10
     0.97




                                                           0.97
                                                                                                            SA_geom




                                                                       5




                                                                                                                                        5
                                                           0.96
                                                                                                            GD_softmax
                                       SA LAM
                                                                                                            GD_nomax
                                       SA geom
     0.95




                                                                                                            GD_max




                                                           0.95


                                                                       0




                                                                                                                                        0
             1   2   5 10       50       200                           1    2   1   5
                                                                                    2   105    20
                                                                                               10    50   100
                                                                                                          50    200200500
                                                                                         time [minutes]
                      time [minutes]                                                          time [minutes]


                            Suggests: many local minima.
                                 Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal 
                                 regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
If  gradient  descent  gets 
stuck  in  local  minima  all 
the  Jme,  how  does  the 
opJmizaJon  landscape 
look like ? 
Landscape analysis
•  Synthetic data based on real MSE2 data
  –  global minimum and solution (parameter values) are
     known.
  –  Measuring distance of the optimization solution to the
     starting position and the known solution.
  –  Measuring error reduction at the
     solution compared to the
     starting position.
Landscape analysis
Experiment      Ini$al distance to  Final distance to              Error Red. 
                solu4on (mean)      solu4on                        (mean) 
                                    (mean) 
1% perturbed     3.4·10−4                   2.8·10−4               88% 
random          0.1                      0.11                      97% 




                                                                           Conclusion:
                                                                           many local
                                                                           minima.
                       Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal 
                       regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
Does the model over-fit ?
•  Cross-validation (5-fold)
            Experiment   Mean RMS error    Mean CC  
                         (SE)              (SE) 
            training     13.39 (0.004)     0.92  (4.8 · 10−5 )
            tesJng       14.04 (0.005)     0.91  (5.7 · 10−5 )



•  Redundancy reduction
   –  Not enough data to begin with
Summary: Optimization & Analysis
•  The objective function is
   ill-posed.
   –  It has a plethora of local
      minima.
   –  It might have many
      global minima.
•  Hence SA is the
   method of choice.
•  There might be a
   tendency to over-fit the
   data.
                                   hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html 
                                                                        hLp://images.nciku.com/ 
Research Goals
•  Optimize Thermodynamic models
   efficiently
•  Analyze robustness of these
   models
•  Explore the regulation of a
     particular gene
•  Examine how the regulatory program evolves
•  Extend current thermodynamic model


                                                Cooperphoto/CORBIS 
Regulation and Evolution of eve
•  Mechanism for regulating eve is
   conserved:
   –  Stripe 2 elements from other
      Drosophila species activate
      eve in D. mel. correctly.
   –  Despite the substantial
      difference in the
      regulatory DNA
      sequence.

                                                                                hLp://www.bio.ilstu.edu/Edwards/ 

                    Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila 
                    despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  
Evaluate Evolution of MSE2
•  Test if the model can identify the MSE2 in these
   other species.

•  Test if the model correctly predicts the
   transcriptional output of the homologous MSE2s.
Searching for MSE2
•  Apply a model trained on D. mel. MSE2 to the TFBS-map
   from sequential windows to find the MSE2 in other
   species
                        MSE2              promoter
                                                           eve
    Other species




                                                                    150
                                                                    100
                                                                    50
                                                                    0
                                                                          40   50   60   70   80   90




                                                                    150
                                        RMS error




                                                                    100
                                                                    50
                                                                    0
                                                                          40   50   60   70   80   90




<   23 27 43        …   13                                    …
                                                                                              >

                         Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 
                         and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
Searching for MSE2: Result
•  Correctly identified the MSE2 in 6/8 species




                                                                                             40
           D. melanogaster




                                                                                             30
                                                                                             20
                                                                                                   RMS error 
                                                                                             10
                                                                                             40
           D.pseudoobscura




                                                                                             30
                                                                                             20
                                                                                             10
                                                                                                  rms error
                                   Genomic locaJon 




                                                                                             40
                             Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 




                                                                                             30
           rimshawi




                             and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   




                                                                                             20
Predicting the output in other species
                   •  Apply a model trained on D. mel. MSE2 to the MSE2s
                      in other species
D. melanogaster 

                                           15




                                                                                                                                                 150
                                                                                                                                                                          Target
                                           10




                                                                                                                                                                          D. melanogaster
                   Log odds score (bits)




                                                                                                                    relative RNA concentration
                                           5




                                                                                                                                                                          D. pseudoobscura
                                           0




                                                                                                                                                                          D. ananassae
                                           !5




                                                                                                                                                 100
                                                                                                                                                                          D. mojavensis
                                           !10
                                           !15




                                                 0   500                           1000                1500
D. mojavensis 




                                                                 rel. genomic position




                                                                                                                                                 50
                                                       bicoid   kruppel         giant      hunchback
                                                       knirps   caudal          tailless




                                                                                                                                                 0
                                                                                                                                                       40   50       60     70      80   90

                                                                                                                                                                 A!P position (%)

                                                                                           Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 
                                                                                           and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
Summary Application
•  Model fits the data
   qualitatively.
•  Predictions are biologically
   meaningful.

•  However, there is room for
   improvement.
Research Goals
•  Optimize Thermodynamic models
   efficiently
•  Analyze robustness of these
   models
•  Explore the regulation of a
     particular gene
•  Examine how the regulatory program evolves
•  Extend current thermodynamic model


                                                Cooperphoto/CORBIS 
One role fits them all?
•  Dual function is proposed for some of the regulatory
   TFs.
   –  E.g. TF Hunchback (Hb) might be an activator when
      regulating stripe2 and repressor for stripe3.


   Late1            3+7                        2            P                       late2                     4+6                    1        5 




                                                 Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the 
                                                 Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906  
                                                 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of 
                                                 Drosophila. PLoS Biol, 2004, 2, E271  
Determine the regulatory role of TFs
•  Different data set: 44 CRMs important for D. mel.
   development but same set of TFs.
•  Determine the best role for each TF in each of the
   CRMs
   –  Brute Force: train a model for all TF role-combinations on
      each of the 44 CRMs.
   –  Record the correlation achieved.
   –  Identify TFs that have dual-function.


                     Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes 
                     Drosophila segmentaJon. Nature, 2008, 451, 535‐540 
                     Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by 
                     SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed 
                     for publicaJon, 2009 
TFs with dual role
                       Bcd         Cad         Hb           Tll          Gt           Kr           Kni         TorRE 
 Det. roles                s           +           s               ‐         s              s          ‐            s 
 Literature               +            +           s               ‐        (s)             s          ‐          NA 
 (consensus) 

 “s”: dual-functioning, “+”: activator, “-”: repressor.


•  E.g. Hb
     –  Activator for 17 CRMs
     –  Repressor for 27 CRMs




                                       Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster. 
                                       PLoS Comput Biol, 2006, 2, e51  
                                       Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of 
                                       Drosophila. PLoS Biol, 2004, 2, E271  
Improvement with dual function
                                  kr_CD1_ru                                                       hb_anterior_actv
       1.0




                                                                              1.0




                                                                                                                                                   1.0
                 target
                 previous roles
                 HbDual                                                       Experiment         number of            mean CC  
                 KrDual                                                                          free                 (SE) 
       0.8




                                                                              0.8




                                                                                                                                                   0.8
                 HbKrDual
                 best                                                                            parameters 
                                                                              Previous                 18              0.27 (0.008) 
       0.6




                                                                              0.6




                                                                                                                                                   0.6
mRNA




                                                                       mRNA




                                                                                                                                            mRNA
                                                                              roles 
                                                                              HbDual                   19              0.35 (0.009) 
       0.4




                                                                              0.4




                                                                                                                                                   0.4
                                                                              KrDual                   19              0.37 (0.007) 
       0.2




                                                                              0.2




                                                                                                                                                   0.2
                                                                              HbKrDual                 20              0.38 (0.007) 
       0.0




                                                                              0.0




                                                                                                                                                   0.0
             0      20            40        60     80        100                    0       20        40         60         80        100

                                       AP                                                                   AP

                                          Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by 
                              run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed 
                                                                                                     eve_37ext_ru
                                          for publicaJon, 2009 
       .0




                                                                              .0




                                                                                                                                                   .0
Marker motifs for dual function
•  Running MEME on the protein sequence of dual-
   functioning TFs to find short motifs (<6aa) present
   in all of them.




                       CI                              KE
              4                               4




                                                     Q
              3                               3


                  K D                               ID
           bits




                                           bits
              2
                    G                         2


              1


              0
                  L E
                  Y         Q
                                              1


                                              0
                                                  L
                                                  V
                   1
                   2
                   3
                   4




                                                   1
                                                   2
                                                   3
                                                   4
            MEME (no SSC) 15.07.09 12:07    MEME (no SSC) 15.07.09 12:07




                                           SUMOyla(on 
                                              mo(f 
SUMOylation
•  Small Ubiquitin-related Modifier a                                                         SUMO
                                                                                            protease
                                                                                    SU
   small protein covalently attached              ATP


   to target-proteins.                                                                                 SU

                                                                                SUMO
•  Involved in many pathways/                      SU
                                                                               pathway
   mechanisms                        E1 activating
                                          enzyme

    –  Compartmentisation                                                                                     target protein
                                                                                               + E3 ligasis
    –  Transcriptional regulation                                                   SU

        •  Can reverse the function of a TF e.g.                                    E2 conjugating
                                                                                    enzyme
           Ikaros (the human homologue of Kr)

•  SUMO (Smt3) is present in D. mel during development

                          Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in 
                          developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, 
                          in submission  
                          del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005, 
                          25, 2688‐2697   
Conclusion
•  Thermodynamic models can be best optimized using SA but
   over-fitting is an issue to keep in mind.
       Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  



•  Non-the-less, they are applicable for
   –  examining the mechanisms of transcriptional regulation,
   –  explore the evolution of a particular regulatory mechanism
       Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   



•  Model prediction improves when dual-function is allowed.
       Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila 
       melanogaster submiLed for publicaJon, 2009 


   –  SUMOylation seems to be a good candidate for the biological
      mechanism of role-change.
       Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster 
       NeurocompuJng, 2009, in submission  
Acknowledgments
•  IMB                                          •    Funding
    –    Timothy Bailey (supervisor)                  –  Institute for Molecular
    –    Mikael Bodén (supervisor)                       Bioscience, The University of
    –    Sean Grimmond (thesis committee)
                                                         Queensland
    –    Nick Hamilton (thesis committee)
                                                      –  Australian Research Council
    –    Fabian Buske
                                                         Centre of Excellence in
    –    Stefan Maetschke
                                                         Bioinformatics
                                                      –  National Institutes of Health
•  Stony Brook University
    –  John Reinitz                                   –  UQ International Research
                                                         Tuition Award




                            Framework for modeling, visualizing, and predicJng the 
                            regulaJon of the transcripJon rate of a target gene 
                              www.bioinforma(cs.org.au/stream 
www.bioinforma(cs.org.au/stream 


•  Framework for modeling, visualizing,
   and predicting the regulation of the
   transcription rate of a target gene.
•  Publicly available
•  Modular: New functions can be
   plugged in




                                                        Many functions
  Command line




                             Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for 
                             transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545. 

Mais conteúdo relacionado

Destaque (7)

Writing assignment 3 molecular biology
Writing assignment 3   molecular biologyWriting assignment 3   molecular biology
Writing assignment 3 molecular biology
 
Writing assignment 4 molecular cell biology
Writing assignment 4   molecular cell biologyWriting assignment 4   molecular cell biology
Writing assignment 4 molecular cell biology
 
Drosophilla
DrosophillaDrosophilla
Drosophilla
 
Developmental cascade of morphogens Define Drosophila Body Plan
Developmental cascade of morphogens Define Drosophila Body PlanDevelopmental cascade of morphogens Define Drosophila Body Plan
Developmental cascade of morphogens Define Drosophila Body Plan
 
Cook2010web
Cook2010webCook2010web
Cook2010web
 
Regulation of gene expression in eukaryotes
Regulation of gene expression in eukaryotesRegulation of gene expression in eukaryotes
Regulation of gene expression in eukaryotes
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Semelhante a Deciphering the regulatory code in the genome

Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Chyi-Tsong Chen
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
tauseefsko
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphia
jshine
 

Semelhante a Deciphering the regulatory code in the genome (20)

Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression ppt
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic Biology
 
ชีวะ Bio
ชีวะ Bio ชีวะ Bio
ชีวะ Bio
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
 
State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data State-of-the-Art Normalization of RT-qPCR Data
State-of-the-Art Normalization of RT-qPCR Data
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphia
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
Terahertz vcsel
Terahertz vcselTerahertz vcsel
Terahertz vcsel
 
C value, Cot Curve & Rot Curve L1-3.pdf
C value, Cot Curve & Rot Curve L1-3.pdfC value, Cot Curve & Rot Curve L1-3.pdf
C value, Cot Curve & Rot Curve L1-3.pdf
 
Leiden_VU_Delft_seminar short.pdf
Leiden_VU_Delft_seminar short.pdfLeiden_VU_Delft_seminar short.pdf
Leiden_VU_Delft_seminar short.pdf
 
An introduction to promoter prediction and analysis
An introduction to promoter prediction and analysisAn introduction to promoter prediction and analysis
An introduction to promoter prediction and analysis
 
Evolution 2012
Evolution 2012Evolution 2012
Evolution 2012
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for Bioinformatics
 
Genome editing tools in plants
Genome editing tools in plantsGenome editing tools in plants
Genome editing tools in plants
 
Estimation of region of attraction for polynomial nonlinear systems a numeric...
Estimation of region of attraction for polynomial nonlinear systems a numeric...Estimation of region of attraction for polynomial nonlinear systems a numeric...
Estimation of region of attraction for polynomial nonlinear systems a numeric...
 

Mais de Denis C. Bauer

Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
Denis C. Bauer
 

Mais de Denis C. Bauer (20)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
ReliF
ReliFReliF
ReliF
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Deciphering the regulatory code in the genome

  • 1. Deciphering the regulatory code in the genome PhD completion seminar Denis C. Bauer Institute for Molecular Bioscience The University of Queensland, Australia By yankodesign  by linh.ngân 
  • 2. Research Aim Thermodynamic model Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed. AAGAAGGTTTTAGTTTAGCC Express gene with  CACCGTAGGTACCTGAAGAA GAAGGTTTTAGTTTAGCCCA 70% capacity when it  CCGTAGGTACCTGAAG  is hot, Thanks! 
  • 3. Why understanding transcriptional regulation is important? •  Insight in the biology of gene pathways. •  Search for regulatory regions with specific function. •  “Re-programming” of genes has therapeutic potential. A transcription gene promoter DNA Broken regulatory  Design and insert a new  element  regulatory element 
  • 4. What do we need to know  for  building  a  model  able  to translate the regulatory  message ? 
  • 5. Background : Enhancer •  Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions. transcription gene promoter enhancer regions
  • 6. Background: Enhancer •  Transcription is regulated by the binding of activator and repressor TFs to an enhancer region. enhancer binding site map Active TF 8 Activators transcription Concentration 2 Repressors
  • 7. Background: Repression •  Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other. Repressor range binding site map enhancer
  • 8. On  which  system  would  we  test  the  model’s  abiliJes ? 
  • 9. Background: Even-skipped gene (eve) Drosophila melanogaster 1 Embryo stained for eve 2 Function representation 3 1 hLp://insects.eugenes.org/  2 Small et al.  3 hLp://bioinform.geneJka.ru 
  • 10. Background: Regulation of eve MSE MSE eve MSE MSE MSE Late1            3+7                        2            P                       late2                     4+6                    1        5  lacZ  Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  • 11. Hypothesis TF  Bindin ns  Genome     conce ntraJo g site  map  re, a rchitectu RNA,  n, m ethylaJo …  predicts gene activation
  • 12. Research Goals •  Optimize Thermodynamic models efficiently. •  Analyze robustness of these models. •  Explore the regulation of a particular gene. •  Examine how the regulatory program evolves. •  Extend current thermodynamic model. Cooperphoto/CORBIS 
  • 13. Model definition Site occupancy (Hill function) Kt · K(s, t) · [t] p(s, t) = 1 + Kt · K(s, t) · [t] Free parameters TF PARAMS Total activation K Binding affinity W (S, T ) = Ets p(s, ts ) 1 − Ets · p(s , ts ) · d(s, s ) s∈S A s ∈S R E Effectiveness quenching of the activator activator contribution GENERAL PARAMS Transcription rate (Arrhenius function)  R0 Max. transcription  R exp W (S, T ) − G0 iff W < G0 rate 0 R(S, T ) =  R0 otherwise, G0 Energy barrier   ts ts Buena Vista Pictures  s s Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  • 14. Training the model 200 100 50 0 < [TF ], [TF ], [TF ], [TF ] > 0 20 40 60 80 100 1 2 3 4 TF Binding TF Concentration Thermodynamic Model predicted Adjust model expression and parameters to 150 100 compare it to improve fit 50 target 0 40 50 60 70 80 90
  • 15. Optimization methods •  Two optimization paradigms –  Simulated Annealing •  LAM schedule (Reinitz et al. 2003) •  Geometric cooling –  Gradient descent •  Three GD variants approximating the objective function, which was not continuously differentiable. •  Judged on accuracy achieved in the given time –  Drosophila MSE2 data with 400 data points and 7 TF (16 free parameters).
  • 16. Optimization Simulated Annealing Gradient Descent 1.00 20 20 SA LAM 0.99 SA geom 0.99 15 15 RMS error 0.98 RMS error CC CC 10 10 0.97 0.97 SA_geom 5 5 0.96 GD_softmax SA LAM GD_nomax SA geom 0.95 GD_max 0.95 0 0 1 2 5 10 50 200 1 2 1 5 2 105 20 10 50 100 50 200200500 time [minutes] time [minutes] time [minutes] Suggests: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  • 17. If  gradient  descent  gets  stuck  in  local  minima  all  the  Jme,  how  does  the  opJmizaJon  landscape  look like ? 
  • 18. Landscape analysis •  Synthetic data based on real MSE2 data –  global minimum and solution (parameter values) are known. –  Measuring distance of the optimization solution to the starting position and the known solution. –  Measuring error reduction at the solution compared to the starting position.
  • 19. Landscape analysis Experiment Ini$al distance to  Final distance to  Error Red.  solu4on (mean)  solu4on  (mean)  (mean)  1% perturbed  3.4·10−4 2.8·10−4 88%  random  0.1  0.11  97%  Conclusion: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  • 20. Does the model over-fit ? •  Cross-validation (5-fold) Experiment Mean RMS error  Mean CC   (SE)   (SE)  training  13.39 (0.004)  0.92  (4.8 · 10−5 ) tesJng  14.04 (0.005)  0.91  (5.7 · 10−5 ) •  Redundancy reduction –  Not enough data to begin with
  • 21. Summary: Optimization & Analysis •  The objective function is ill-posed. –  It has a plethora of local minima. –  It might have many global minima. •  Hence SA is the method of choice. •  There might be a tendency to over-fit the data. hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html  hLp://images.nciku.com/ 
  • 22. Research Goals •  Optimize Thermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  • 23. Regulation and Evolution of eve •  Mechanism for regulating eve is conserved: –  Stripe 2 elements from other Drosophila species activate eve in D. mel. correctly. –  Despite the substantial difference in the regulatory DNA sequence. hLp://www.bio.ilstu.edu/Edwards/  Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila  despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  
  • 24. Evaluate Evolution of MSE2 •  Test if the model can identify the MSE2 in these other species. •  Test if the model correctly predicts the transcriptional output of the homologous MSE2s.
  • 25. Searching for MSE2 •  Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species MSE2 promoter eve Other species 150 100 50 0 40 50 60 70 80 90 150 RMS error 100 50 0 40 50 60 70 80 90 < 23 27 43 … 13 … > Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  • 26. Searching for MSE2: Result •  Correctly identified the MSE2 in 6/8 species 40 D. melanogaster 30 20 RMS error  10 40 D.pseudoobscura 30 20 10 rms error Genomic locaJon  40 Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  30 rimshawi and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    20
  • 27. Predicting the output in other species •  Apply a model trained on D. mel. MSE2 to the MSE2s in other species D. melanogaster  15 150 Target 10 D. melanogaster Log odds score (bits) relative RNA concentration 5 D. pseudoobscura 0 D. ananassae !5 100 D. mojavensis !10 !15 0 500 1000 1500 D. mojavensis  rel. genomic position 50 bicoid kruppel giant hunchback knirps caudal tailless 0 40 50 60 70 80 90 A!P position (%) Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  • 28. Summary Application •  Model fits the data qualitatively. •  Predictions are biologically meaningful. •  However, there is room for improvement.
  • 29. Research Goals •  Optimize Thermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  • 30. One role fits them all? •  Dual function is proposed for some of the regulatory TFs. –  E.g. TF Hunchback (Hb) might be an activator when regulating stripe2 and repressor for stripe3. Late1            3+7                        2            P                       late2                     4+6                    1        5  Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the  Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  • 31. Determine the regulatory role of TFs •  Different data set: 44 CRMs important for D. mel. development but same set of TFs. •  Determine the best role for each TF in each of the CRMs –  Brute Force: train a model for all TF role-combinations on each of the 44 CRMs. –  Record the correlation achieved. –  Identify TFs that have dual-function. Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes  Drosophila segmentaJon. Nature, 2008, 451, 535‐540  Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  for publicaJon, 2009 
  • 32. TFs with dual role Bcd  Cad  Hb  Tll  Gt  Kr  Kni  TorRE  Det. roles  s  +  s  ‐  s  s  ‐  s  Literature  +  +  s  ‐  (s)  s  ‐  NA  (consensus)  “s”: dual-functioning, “+”: activator, “-”: repressor. •  E.g. Hb –  Activator for 17 CRMs –  Repressor for 27 CRMs Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster.  PLoS Comput Biol, 2006, 2, e51   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  • 33. Improvement with dual function kr_CD1_ru hb_anterior_actv 1.0 1.0 1.0 target previous roles HbDual Experiment number of  mean CC   KrDual free  (SE)  0.8 0.8 0.8 HbKrDual best parameters  Previous  18  0.27 (0.008)  0.6 0.6 0.6 mRNA mRNA mRNA roles  HbDual  19  0.35 (0.009)  0.4 0.4 0.4 KrDual  19  0.37 (0.007)  0.2 0.2 0.2 HbKrDual  20  0.38 (0.007)  0.0 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 AP AP Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  eve_37ext_ru for publicaJon, 2009  .0 .0 .0
  • 34. Marker motifs for dual function •  Running MEME on the protein sequence of dual- functioning TFs to find short motifs (<6aa) present in all of them. CI KE 4 4 Q 3 3 K D ID bits bits 2 G 2 1 0 L E Y Q 1 0 L V 1 2 3 4 1 2 3 4 MEME (no SSC) 15.07.09 12:07 MEME (no SSC) 15.07.09 12:07 SUMOyla(on  mo(f 
  • 35. SUMOylation •  Small Ubiquitin-related Modifier a SUMO protease SU small protein covalently attached ATP to target-proteins. SU SUMO •  Involved in many pathways/ SU pathway mechanisms E1 activating enzyme –  Compartmentisation target protein + E3 ligasis –  Transcriptional regulation SU •  Can reverse the function of a TF e.g. E2 conjugating enzyme Ikaros (the human homologue of Kr) •  SUMO (Smt3) is present in D. mel during development Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in  developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009,  in submission   del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005,  25, 2688‐2697   
  • 36. Conclusion •  Thermodynamic models can be best optimized using SA but over-fitting is an issue to keep in mind. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646   •  Non-the-less, they are applicable for –  examining the mechanisms of transcriptional regulation, –  explore the evolution of a particular regulatory mechanism Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    •  Model prediction improves when dual-function is allowed. Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila  melanogaster submiLed for publicaJon, 2009  –  SUMOylation seems to be a good candidate for the biological mechanism of role-change. Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster  NeurocompuJng, 2009, in submission  
  • 37. Acknowledgments •  IMB •  Funding –  Timothy Bailey (supervisor) –  Institute for Molecular –  Mikael Bodén (supervisor) Bioscience, The University of –  Sean Grimmond (thesis committee) Queensland –  Nick Hamilton (thesis committee) –  Australian Research Council –  Fabian Buske Centre of Excellence in –  Stefan Maetschke Bioinformatics –  National Institutes of Health •  Stony Brook University –  John Reinitz –  UQ International Research Tuition Award Framework for modeling, visualizing, and predicJng the  regulaJon of the transcripJon rate of a target gene  www.bioinforma(cs.org.au/stream 
  • 38. www.bioinforma(cs.org.au/stream  •  Framework for modeling, visualizing, and predicting the regulation of the transcription rate of a target gene. •  Publicly available •  Modular: New functions can be plugged in Many functions Command line Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for  transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.