SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
1




Computational discovery of
composite motifs in DNA

Geir Kjetil Sandve, Osman Abul and Finn Drabløs


                                     Finn Drabløs [tare.medisin.ntnu.no]
Introduction                                                  2



   Basic gene regulation
 • Proteins (transcription
   factors, TFs)
   recognise binding
   sites (sequence
   motifs) in gene
   regulatory regions
 • The transcription
   factors stabilise the                      Michael Lones

   transcription complex
 • Distal promoters
   (enhancers) interact
   through DNA looping

                             Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                                     3



 De novo prediction of binding sites
 • Make a set of co-regulated genes
     – E.g. from microarray experiments, normally imperfect sets
 • Extract assumed regulatory regions
     – Normally a fixed region upstream from TSS of each gene
 • Search for overrepresented patterns in these regions
     – Use a model for what a motif should look like
         • Consensus sequence with mismatches
         • Position Weight Matrix (PWM) based on log odds scores for occurrences
     – Use a strategy to find (local) optima for this model
         • E.g. Gibbs sampling, expectation maximisation …

 • Problem: More than 100 different methods
     – Which methods are reliable?



                                                              Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                            4



   Benchmarking of de novo tools
   • Tompa et al, Nature Biotech 23, 137-144 (2005)
   • Tested 14 different tools for motif discovery
   • Used 52 data sets from fly (6), human (26), mouse (12)
     and yeast (8)
   • Used data sets with real (Transfac) binding sites in
     different sequence contexts
       – ”real” – The actual promoter sequences
       – ”generic” – Randomly chosen promoter sequences from same genome
       – ”markov” – Sequences generated by Markov chain of order 3
   • Measured performance at nucleotide level




                                                     Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                                  5




 Average benchmark performance
   Method         TP      FP     FN       TN     TP FN
   AlignAce       477    3789   8186   436048    FP TN   Pred_P        Pred_N
   ANN-Spec       754    7799   7909   432038
   Consensus      178    1394   8485   438443   Real_P      471            8192
   GLAM           223    5619   8440   434218   Real_N     5167        434670
   Improbizer     594   7942    8069   431895
   MEME           581    4836   8082   435001
   MEME3          673    6726   7990   433111   nCC = 0.053
   MITRA          272    4092   8391   435745
   MotifSampler   520   4344    8143   435493   Performance is close to
   Oligo/dyad     345    1891   8318   437946
   QuickScore     151    4856   8512   434981
                                                random!
   SeSiMCMC       530   13813   8133   426024
   Weeder         748    1748   7915   438089   Too many FP, FN
   YMF            554    3492   8109   436345




                                                           Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                              6



   Can we improve performance?
 • Use better motif representations
     – Hidden Markov Models
 • Use better algorithms
     – More exhaustive searching TODAY!
     – Discriminative motif discovery
 • Use better background models
     – Real sequences (not Markov models)     TODAY!



 • Filter out false positives
     – Identify “motif-like” solutions
     – Identify regulatory regions
     – Use co-occurrence of motifs
                                         TODAY!
         • Modules, composite motifs

                                                       Finn Drabløs [tare.medisin.ntnu.no]
Approach                                                               7



 Composite motif discovery




• TFs act together as modules
• Modules are not completely unique

                                      Finn Drabløs [tare.medisin.ntnu.no]
Algorithm                                                                                           8



 Basic definitions
 • Frequent modules
     – Modules (and motifs) can be ranked by support
            • Fraction of sequences where the module (or motif) is found
     – Support is monotonous
            • Adding a motif to a module can never increase module support

 • Specific modules
     – Modules can be ranked by hit probability
            • Probability that a sequence supports the module
     – Hit probability is monotonous (as for support)
     – Specific modules have low hit probability in background sequences
 • Significant modules
     – Modules can be ranked by significance
            • Probability that support in sequence ≠ background



                                                                   Finn Drabløs [tare.medisin.ntnu.no]
Algorithm                                                                      9



 Search tree
 • Discretized single motifs
   {1, 2, 3, …} organised as an
   implicit search tree
 • Support set H and hit
   probability P is iteratively
   computed (monotonicity)
     – Initially H is full sequence set and
       P is 1)
 • Search tree is efficiently
   pruned (indicated with X)
   based on H and P
 • Final output can be ranked
   by module significance
                                              Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                                                                   10



 Module significance
 • Position-level probability in background
     – Probability of single motif at specific location
     – Estimated from real DNA background sequences
 • Sequence-level probability in background
     – Probability of single motif at least once in given background sequence
     – Estimated as union of position-level probabilities
 • Hit-probability in background
     – Probability of composite motif at least once in background sequence
     – Estimated as product of individual motif components
 • Significance p-value of observed support
     – Probability of seeing at least observed support in background set
     – Estimated as right tail of binomial distribution
 p       • At least k out of n successes given hit-probability


                                                                 Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                                                        11



 Problem specification
 • Frequent and specific modules
     – Use thresholds on support and
       specificity
     – Complete solutions but multi-
       objective optimization
 • Top-ranking modules
     – Combine objectives into single
       measure, e.g. p-value
 • Pareto-optimal modules
     – Each objective is a separate
       dimension of optimality
                                          http://en.wikipedia.org/wiki/Pareto_efficiency
     – Return Pareto front of composite
       motifs



                                                      Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                            12



 Motif prediction flowchart




                          Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                               13



 Benchmark data set



 • Known composite motifs from the TransCompel database
 • Tests performance by adding “noise matrices” to input
    – Matrices for TFs assumed not to bind in sequence set
        • Will have random (false positive) hits
    – Selected at random from Transfac
        • Max noise level includes all Transfac matrices
    – Similar to actual usage
        • Searching for motifs consisting of unknown TFs


                                                           Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                14



 General performance (nCC)




 • Compo compared to several other tools
    – TransCompel benchmark set
 • Compo has clearly best performance, in particular at
   realistic settings (high noise level)

                                            Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                       15



 Background and support
 • Compo gains performance from realistic background (real
   DNA) and support
    – Random DNA based on multinomial sequence model
 • Performance without real DNA background or support
   comparable to other tools




                                                   Finn Drabløs [tare.medisin.ntnu.no]
Future development                                                            16



 Pareto front
• Pareto front on support,
  max motif distance and
  significance (colour)
• Compo prediction not
  optimal
    – Compo predicted Ets and
      GATA
    – Annotated motif is AP1 and
      NFAT
• Explore alternative
  solutions
• Explore parameter                X – NFAT
  interactions                     O – AP1
                                              Finn Drabløs [tare.medisin.ntnu.no]
Acknowledgements                                                                                17



  The research group
   BiGR                                   Programmers / Technicians
                                          Johansen, Jostein
   Drabløs, Finn                          Thomas, Laurent
                                          Olsen, Lene C.
   Postdocs / Researchers
   Sætrom, Pål                            Others
   Kusnierczyk, Wacek                     Solbakken, Trude
   Rye, Morten
   Klein, Jörn                            Master students
   Anderssen, Endre                       Bolstad, Kjersti
   Wang, Xinhui (ERCIM)                   Muiser, Iwe
   Capatana, Ana (ERCIM, starting 2009)   Sponberg, Bjørn
                                          Brands, Stef
   PhDs                                   Skaland, Even
   Bratlie, Marit Skyrud
   Klepper, Kjetil                        Former members
   Saito, Takaya                          Sandve, Geir Kjetil
   Lundbæk, Marie                         Abul, Osman
   Håndstad, Tony                         Schwalie, Petra
                                          Lones, Michael

                                                                Finn Drabløs [tare.medisin.ntnu.no]

Mais conteúdo relacionado

Semelhante a Drablos Composite Motifs Bosc2009

Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Varun Ojha
 
Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012GenomeInABottle
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides Sheamus McGovern
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40Jessica Willis
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...PyData
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural networkSoo-Yong Shin
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summaryGenomeInABottle
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptNJUSTAiMo
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsEnplus Advisors, Inc.
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15MLconf
 

Semelhante a Drablos Composite Motifs Bosc2009 (20)

Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
 
Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
High-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for MedicineHigh-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for Medicine
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
SC1.pptx
SC1.pptxSC1.pptx
SC1.pptx
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural network
 
Neural network
Neural networkNeural network
Neural network
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summary
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random Forests
 
13 random forest
13 random forest13 random forest
13 random forest
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
 

Mais de bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009bosc
 

Mais de bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 

Último

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Último (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Drablos Composite Motifs Bosc2009

  • 1. 1 Computational discovery of composite motifs in DNA Geir Kjetil Sandve, Osman Abul and Finn Drabløs Finn Drabløs [tare.medisin.ntnu.no]
  • 2. Introduction 2 Basic gene regulation • Proteins (transcription factors, TFs) recognise binding sites (sequence motifs) in gene regulatory regions • The transcription factors stabilise the Michael Lones transcription complex • Distal promoters (enhancers) interact through DNA looping Finn Drabløs [tare.medisin.ntnu.no]
  • 3. Motivation 3 De novo prediction of binding sites • Make a set of co-regulated genes – E.g. from microarray experiments, normally imperfect sets • Extract assumed regulatory regions – Normally a fixed region upstream from TSS of each gene • Search for overrepresented patterns in these regions – Use a model for what a motif should look like • Consensus sequence with mismatches • Position Weight Matrix (PWM) based on log odds scores for occurrences – Use a strategy to find (local) optima for this model • E.g. Gibbs sampling, expectation maximisation … • Problem: More than 100 different methods – Which methods are reliable? Finn Drabløs [tare.medisin.ntnu.no]
  • 4. Motivation 4 Benchmarking of de novo tools • Tompa et al, Nature Biotech 23, 137-144 (2005) • Tested 14 different tools for motif discovery • Used 52 data sets from fly (6), human (26), mouse (12) and yeast (8) • Used data sets with real (Transfac) binding sites in different sequence contexts – ”real” – The actual promoter sequences – ”generic” – Randomly chosen promoter sequences from same genome – ”markov” – Sequences generated by Markov chain of order 3 • Measured performance at nucleotide level Finn Drabløs [tare.medisin.ntnu.no]
  • 5. Motivation 5 Average benchmark performance Method TP FP FN TN TP FN AlignAce 477 3789 8186 436048 FP TN Pred_P Pred_N ANN-Spec 754 7799 7909 432038 Consensus 178 1394 8485 438443 Real_P 471 8192 GLAM 223 5619 8440 434218 Real_N 5167 434670 Improbizer 594 7942 8069 431895 MEME 581 4836 8082 435001 MEME3 673 6726 7990 433111 nCC = 0.053 MITRA 272 4092 8391 435745 MotifSampler 520 4344 8143 435493 Performance is close to Oligo/dyad 345 1891 8318 437946 QuickScore 151 4856 8512 434981 random! SeSiMCMC 530 13813 8133 426024 Weeder 748 1748 7915 438089 Too many FP, FN YMF 554 3492 8109 436345 Finn Drabløs [tare.medisin.ntnu.no]
  • 6. Motivation 6 Can we improve performance? • Use better motif representations – Hidden Markov Models • Use better algorithms – More exhaustive searching TODAY! – Discriminative motif discovery • Use better background models – Real sequences (not Markov models) TODAY! • Filter out false positives – Identify “motif-like” solutions – Identify regulatory regions – Use co-occurrence of motifs TODAY! • Modules, composite motifs Finn Drabløs [tare.medisin.ntnu.no]
  • 7. Approach 7 Composite motif discovery • TFs act together as modules • Modules are not completely unique Finn Drabløs [tare.medisin.ntnu.no]
  • 8. Algorithm 8 Basic definitions • Frequent modules – Modules (and motifs) can be ranked by support • Fraction of sequences where the module (or motif) is found – Support is monotonous • Adding a motif to a module can never increase module support • Specific modules – Modules can be ranked by hit probability • Probability that a sequence supports the module – Hit probability is monotonous (as for support) – Specific modules have low hit probability in background sequences • Significant modules – Modules can be ranked by significance • Probability that support in sequence ≠ background Finn Drabløs [tare.medisin.ntnu.no]
  • 9. Algorithm 9 Search tree • Discretized single motifs {1, 2, 3, …} organised as an implicit search tree • Support set H and hit probability P is iteratively computed (monotonicity) – Initially H is full sequence set and P is 1) • Search tree is efficiently pruned (indicated with X) based on H and P • Final output can be ranked by module significance Finn Drabløs [tare.medisin.ntnu.no]
  • 10. Implementation 10 Module significance • Position-level probability in background – Probability of single motif at specific location – Estimated from real DNA background sequences • Sequence-level probability in background – Probability of single motif at least once in given background sequence – Estimated as union of position-level probabilities • Hit-probability in background – Probability of composite motif at least once in background sequence – Estimated as product of individual motif components • Significance p-value of observed support – Probability of seeing at least observed support in background set – Estimated as right tail of binomial distribution p • At least k out of n successes given hit-probability Finn Drabløs [tare.medisin.ntnu.no]
  • 11. Implementation 11 Problem specification • Frequent and specific modules – Use thresholds on support and specificity – Complete solutions but multi- objective optimization • Top-ranking modules – Combine objectives into single measure, e.g. p-value • Pareto-optimal modules – Each objective is a separate dimension of optimality http://en.wikipedia.org/wiki/Pareto_efficiency – Return Pareto front of composite motifs Finn Drabløs [tare.medisin.ntnu.no]
  • 12. Implementation 12 Motif prediction flowchart Finn Drabløs [tare.medisin.ntnu.no]
  • 13. Benchmarking 13 Benchmark data set • Known composite motifs from the TransCompel database • Tests performance by adding “noise matrices” to input – Matrices for TFs assumed not to bind in sequence set • Will have random (false positive) hits – Selected at random from Transfac • Max noise level includes all Transfac matrices – Similar to actual usage • Searching for motifs consisting of unknown TFs Finn Drabløs [tare.medisin.ntnu.no]
  • 14. Benchmarking 14 General performance (nCC) • Compo compared to several other tools – TransCompel benchmark set • Compo has clearly best performance, in particular at realistic settings (high noise level) Finn Drabløs [tare.medisin.ntnu.no]
  • 15. Benchmarking 15 Background and support • Compo gains performance from realistic background (real DNA) and support – Random DNA based on multinomial sequence model • Performance without real DNA background or support comparable to other tools Finn Drabløs [tare.medisin.ntnu.no]
  • 16. Future development 16 Pareto front • Pareto front on support, max motif distance and significance (colour) • Compo prediction not optimal – Compo predicted Ets and GATA – Annotated motif is AP1 and NFAT • Explore alternative solutions • Explore parameter X – NFAT interactions O – AP1 Finn Drabløs [tare.medisin.ntnu.no]
  • 17. Acknowledgements 17 The research group BiGR Programmers / Technicians Johansen, Jostein Drabløs, Finn Thomas, Laurent Olsen, Lene C. Postdocs / Researchers Sætrom, Pål Others Kusnierczyk, Wacek Solbakken, Trude Rye, Morten Klein, Jörn Master students Anderssen, Endre Bolstad, Kjersti Wang, Xinhui (ERCIM) Muiser, Iwe Capatana, Ana (ERCIM, starting 2009) Sponberg, Bjørn Brands, Stef PhDs Skaland, Even Bratlie, Marit Skyrud Klepper, Kjetil Former members Saito, Takaya Sandve, Geir Kjetil Lundbæk, Marie Abul, Osman Håndstad, Tony Schwalie, Petra Lones, Michael Finn Drabløs [tare.medisin.ntnu.no]