SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Musite: Prediction of Protein
  Phosphorylation Sites


               Jianjiong Gao
          University of Missouri Columbia
                        Missouri,
           http://musite.sourceforge.net/
Background:
       Protein Phosphorylation
Protein phosphorylation is one of the most
important p
  p       post-translational modifications.
  It was estimated that up to 50% of proteins are
  phosphorylated in some cellular state
  Abnormality in phosphorylation is a cause or
  consequence of many diseases
    Cancer
    Diabete
    Parkinson’s
    Hepertitis B
    …
Background:
       Protein Phosphorylation
Phosphorylation-dephosphorylation is a
biochemical switch system regulating
                     y       g      g
various cellular processes.
Catalyzed by various specific protein
kinases.
                   Kinase
                               ON

           OFF
                 Phosphatase
Phosphorylation Site Prediction
         Problem Formulation



Phosphorylation site: a phosphorylated amino acid
in a protein (determined by protein sequence)
General phosphorylation site prediction: to predict
whether an amino acid can be phosphorylated
Kinase-specific p
         p      phosphorylation site p
                    p y              prediction: to
predict whether an amino acid can be
p
phosphorylated by a specific kinase
     p y          y  p
Based on protein sequence only
Limitations of Current Methods

Current prediction tools have
limitations when applying to whole
proteomes
 Prediction accuracy could be improved
 Most were released as web servers and have
 restrictions for the uploaded data by users
 Training data were out of date
 Stringency adjustment was not fully
 supported
Our tool Musite is unique

Novel method with better accuracy
First open source tool in the field that meet
      open-source
OSI Open Standards Requirement
Standalone program designed for proteome-
scale prediction
      p
Support both general and kinase-specific
phosphorylation site prediction
Support customized model training
Support continuous stringency adjustment
Phosphorylation Site Prediction
                 Flowchart
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Data Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Feature Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Feature Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
KNN Features
        Motivation
Rationale of using KNN features: local
sequence clusters exist around
phosphorylation sites, since
  Each phosphorylation site is a substrate of a specific
  protein kinase
  Substrates of the same kinase or kinase family
  usually shares similar patterns in local sequences
KNN Features
         Result
                                                                       (A)

Overall, phosphosites                                              Phospho           Nonphospho

have larger KNN scores                 1



than non-phosphosites                 0.8




                               core
                          KNN sc
                                      0.6



Average KNN scores                    0.4


  0.7~0.8 for phosphosites            0.2


  ≈0.5 for non-phosphosites            0
                                            0.25
                                            0 25         0.5
                                                         05              1            2           4
                                                   Size of nearest neighbors (% of sample size)


                                                   Boxplot of KNN features
                                                      (Human S /Th )
                                                      (H       Ser/Thr)
Disorder Features
        Concept & Rationale

Disordered region (structure)
 Some parts of a protein have a rigid structure,
 such as α-helix and β-sheet.
 Other parts, disordered regions, do not have
 well defined
 well-defined conformations
 The conformational flexibility of disordered
 regions may facilitate protein phosphorylation
 [Dunker, 2008]: protein phosphorylation sites
 are frequently located within disordered regions
Disorder Features
             Result
For h
F phosphosites
       h it                                                     (A) Phospho-S/T in H. sapiens
                                                                                                        6
  Occurrence increases exponentially             10000                                                  5
  when d so de sco e increases
    e disorder score c eases                                                                            4
For non-phosphosites                                  5000                                              3
                                                                                                        2
  Significantly different distribution




                                         occurrence
                                                  e
                                                         0                                              1
                                                          0        0.2     0.4     0.6      0.8     1
                                                           x 10
                                                               5
                                                                (B) Non-phospho-S/T in H. sapiens       0
Disorder score > 0.5                                   2.5
                                                                                                        -1
                                                        2
  Phosphosites: ~91%                                                                                    -2
                                                       1.5
  Non-phosphosites: ~55%                                                                                -3
                                                        1
Phosphosites are significantly                         0.5
                                                       05
                                                                                                        -4

over-represented in disordered                          0
                                                                                                        -5
                                                                                                        -6
regions                                                  0       0.2      0.4      0.6
                                                                        Disorder Score
                                                                                           0.8      1


                                                Histogram of disorder features
                                                      (Human Ser/Thr)
Amino Acid Frequencies
                              Result
                 quency)     1
                           0.5
                             0
Log2(Ratio of Freq



                           -0.5       H. sapiens (S/T)
                                      M. musculus (S/T)
                            -1
                             1
                                      D. melanogaster (S/T)
                           -1.5       C. elegans (S/T)
                            -2
                             2        S. cerevisiae (S/T)
                                                    ( )
  g




                                      A. thaliana (S/T)
                           -2.5
                                  P R D E S K G A Q N V T H L M I F Y W C
                                                  Amino Acid
                                                  A i A id

                   P, R, D, E, S, K, and G are enriched around
                   phosphosites
                   C, W, Y, F, I, M, L, H, T, and V are depleted
Phosphorylation Site Prediction
                 Classifier Training
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Results
        Trained Models
General Prediction      Kinase-Specific
  Human ser/thr
           /            Prediction
  Human tyr               ATM
  Mouse ser/thr           CDK/CDK1/CDK2
  Mouse tyr               CK1/CK2
  Fluit fly ser/thr       MAPK1/MAPK3
  Worm ser/thr            PKA
  Yeast ser/thr           PKB
  Arabidopsis ser/thr     PKC
                          Src
Results
                       Cross validation
               1
                                                                            C. elegans (S/T)
                                                                            A. thaliana (S/T)
                                                                            H. sapiens (S/T)
              0.8
              08
                                                                            M. musculus (S/T)
                                                                            S. cerevisiae (S/T)
                            0.8                                             D. melanogaster (S/T)
Sensitivity
          y




              0.6
              06
                                                                            M. musculus (Y)
                            0.6                                             H. sapiens (Y)
                                                                            Random guess
              0.4
              04
S




                            0.4


              0.2
              02            0.2


                             0
                              0         0.02   0.04   0.06     0.08   0.1
               0
                0     0.2         0.4          0.6           0.8        1
                                  1 - Specificity
Results
             Comparison to other tools
               1

              0.9
                                                              Musite
              0.8
              08
                                                              Scan-x
              0.7                                             DISPHOS
                                                              NetPhos
              0.6
              06
Sensitivity




                          0.6
              0.5
S




              0.4
              0           0.4

              0.3
                          0.2
              0.2

              0.1
                           0
                            0    0.02
                                 0 02      0.04
                                           0 04       0.06
                                                      0 06    0.08
                                                              0 08   0.1
                                                                     01
               0
                0   0.2         0.4             0.6          0.8           1
                                  1 - Specificity
Phosphorylation Site Prediction
        Software Implementation-Musite

Open Source
  License: GNU General Public License (GPL)
  http://musite.sourceforge.net/
  http://musite sourceforge net/
Stand-alone application
  Based on Java
  Support Windows Linux and Mac OS X
          Windows, Linux,
A web server is also being developed
                         g       p
  http://musite.net/
Implementation
   User Interface
Implementation
      Customized Model Training

A unique utility for users to train
prediction models f
   di ti       d l from th i own d t
                         their      data
  Take advantage of latest data
  Train disease-specific models
  Train organ-specific models
  Integrate into experimental p
       g           p          procedure in an
  iterative way
Summary

Musite is for prediction of general and kinase-
specific phosphosites in a better accuracy


Musite is a open-source standalone program
capable of performing proteome-wide
                      proteome wide
predictions
Acknowledgements

Dr. Dong Xu (University of Missouri)
Dr. Jay Thelen (U e s ty o Missouri)
          e e (University of ssou )
Dr. Keith Dunker (Indiana University)
Curtis Bollinger (University of Missouri)


Funding                          Visit us at
   NSF [# DBI 0604439]
          DBI-0604439]               http://musite.sourceforge.net
                                        p                   g
   NIH [# R21/R33 GM078601]          http://musite.net
                                     Poster R09 at ISMB

Mais conteúdo relacionado

Destaque

Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)
Adolfo Orive
 
Article Fogo Glissement Caldeira
Article Fogo Glissement CaldeiraArticle Fogo Glissement Caldeira
Article Fogo Glissement Caldeira
nastydette
 
Marketing Life Prospective 2012
Marketing Life Prospective 2012Marketing Life Prospective 2012
Marketing Life Prospective 2012
Arif Mahmood
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
aedison
 
Advanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalogAdvanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalog
Jean Smith
 
mHealth Insights for Wireless Carrier
mHealth Insights for Wireless CarriermHealth Insights for Wireless Carrier
mHealth Insights for Wireless Carrier
Karthik Ethirajan
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming intro
tlattimore
 

Destaque (20)

The Case For The Sustainable Workplace
The Case For The Sustainable WorkplaceThe Case For The Sustainable Workplace
The Case For The Sustainable Workplace
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videos
 
Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)
 
Nordic e commerce3
Nordic e commerce3Nordic e commerce3
Nordic e commerce3
 
Limecoconut
LimecoconutLimecoconut
Limecoconut
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
 
Article Fogo Glissement Caldeira
Article Fogo Glissement CaldeiraArticle Fogo Glissement Caldeira
Article Fogo Glissement Caldeira
 
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
 
Marketing Life Prospective 2012
Marketing Life Prospective 2012Marketing Life Prospective 2012
Marketing Life Prospective 2012
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
 
Cheers
CheersCheers
Cheers
 
Advanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalogAdvanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalog
 
Manager Info Kit
Manager Info KitManager Info Kit
Manager Info Kit
 
Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011
 
Hoe schrijf je een brief?
Hoe schrijf je een brief?Hoe schrijf je een brief?
Hoe schrijf je een brief?
 
mHealth Insights for Wireless Carrier
mHealth Insights for Wireless CarriermHealth Insights for Wireless Carrier
mHealth Insights for Wireless Carrier
 
Single Sign On Social Login
Single Sign On Social LoginSingle Sign On Social Login
Single Sign On Social Login
 
Bibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toeBibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toe
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming intro
 
Cadets cat
Cadets catCadets cat
Cadets cat
 

Mais de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
BOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
BOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
BOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
BOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
BOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
BOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
BOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
BOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
BOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
BOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
BOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
BOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
BOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
BOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
BOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
BOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
BOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
BOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
BOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
BOSC 2010
 

Mais de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Gao bosc2010 musite

  • 1. Musite: Prediction of Protein Phosphorylation Sites Jianjiong Gao University of Missouri Columbia Missouri, http://musite.sourceforge.net/
  • 2. Background: Protein Phosphorylation Protein phosphorylation is one of the most important p p post-translational modifications. It was estimated that up to 50% of proteins are phosphorylated in some cellular state Abnormality in phosphorylation is a cause or consequence of many diseases Cancer Diabete Parkinson’s Hepertitis B …
  • 3. Background: Protein Phosphorylation Phosphorylation-dephosphorylation is a biochemical switch system regulating y g g various cellular processes. Catalyzed by various specific protein kinases. Kinase ON OFF Phosphatase
  • 4. Phosphorylation Site Prediction Problem Formulation Phosphorylation site: a phosphorylated amino acid in a protein (determined by protein sequence) General phosphorylation site prediction: to predict whether an amino acid can be phosphorylated Kinase-specific p p phosphorylation site p p y prediction: to predict whether an amino acid can be p phosphorylated by a specific kinase p y y p Based on protein sequence only
  • 5. Limitations of Current Methods Current prediction tools have limitations when applying to whole proteomes Prediction accuracy could be improved Most were released as web servers and have restrictions for the uploaded data by users Training data were out of date Stringency adjustment was not fully supported
  • 6. Our tool Musite is unique Novel method with better accuracy First open source tool in the field that meet open-source OSI Open Standards Requirement Standalone program designed for proteome- scale prediction p Support both general and kinase-specific phosphorylation site prediction Support customized model training Support continuous stringency adjustment
  • 7. Phosphorylation Site Prediction Flowchart Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 8. Phosphorylation Site Prediction Data Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 9. Phosphorylation Site Prediction Feature Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 10. Phosphorylation Site Prediction Feature Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 11. KNN Features Motivation Rationale of using KNN features: local sequence clusters exist around phosphorylation sites, since Each phosphorylation site is a substrate of a specific protein kinase Substrates of the same kinase or kinase family usually shares similar patterns in local sequences
  • 12. KNN Features Result (A) Overall, phosphosites Phospho Nonphospho have larger KNN scores 1 than non-phosphosites 0.8 core KNN sc 0.6 Average KNN scores 0.4 0.7~0.8 for phosphosites 0.2 ≈0.5 for non-phosphosites 0 0.25 0 25 0.5 05 1 2 4 Size of nearest neighbors (% of sample size) Boxplot of KNN features (Human S /Th ) (H Ser/Thr)
  • 13. Disorder Features Concept & Rationale Disordered region (structure) Some parts of a protein have a rigid structure, such as α-helix and β-sheet. Other parts, disordered regions, do not have well defined well-defined conformations The conformational flexibility of disordered regions may facilitate protein phosphorylation [Dunker, 2008]: protein phosphorylation sites are frequently located within disordered regions
  • 14. Disorder Features Result For h F phosphosites h it (A) Phospho-S/T in H. sapiens 6 Occurrence increases exponentially 10000 5 when d so de sco e increases e disorder score c eases 4 For non-phosphosites 5000 3 2 Significantly different distribution occurrence e 0 1 0 0.2 0.4 0.6 0.8 1 x 10 5 (B) Non-phospho-S/T in H. sapiens 0 Disorder score > 0.5 2.5 -1 2 Phosphosites: ~91% -2 1.5 Non-phosphosites: ~55% -3 1 Phosphosites are significantly 0.5 05 -4 over-represented in disordered 0 -5 -6 regions 0 0.2 0.4 0.6 Disorder Score 0.8 1 Histogram of disorder features (Human Ser/Thr)
  • 15. Amino Acid Frequencies Result quency) 1 0.5 0 Log2(Ratio of Freq -0.5 H. sapiens (S/T) M. musculus (S/T) -1 1 D. melanogaster (S/T) -1.5 C. elegans (S/T) -2 2 S. cerevisiae (S/T) ( ) g A. thaliana (S/T) -2.5 P R D E S K G A Q N V T H L M I F Y W C Amino Acid A i A id P, R, D, E, S, K, and G are enriched around phosphosites C, W, Y, F, I, M, L, H, T, and V are depleted
  • 16. Phosphorylation Site Prediction Classifier Training Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 17. Results Trained Models General Prediction Kinase-Specific Human ser/thr / Prediction Human tyr ATM Mouse ser/thr CDK/CDK1/CDK2 Mouse tyr CK1/CK2 Fluit fly ser/thr MAPK1/MAPK3 Worm ser/thr PKA Yeast ser/thr PKB Arabidopsis ser/thr PKC Src
  • 18. Results Cross validation 1 C. elegans (S/T) A. thaliana (S/T) H. sapiens (S/T) 0.8 08 M. musculus (S/T) S. cerevisiae (S/T) 0.8 D. melanogaster (S/T) Sensitivity y 0.6 06 M. musculus (Y) 0.6 H. sapiens (Y) Random guess 0.4 04 S 0.4 0.2 02 0.2 0 0 0.02 0.04 0.06 0.08 0.1 0 0 0.2 0.4 0.6 0.8 1 1 - Specificity
  • 19. Results Comparison to other tools 1 0.9 Musite 0.8 08 Scan-x 0.7 DISPHOS NetPhos 0.6 06 Sensitivity 0.6 0.5 S 0.4 0 0.4 0.3 0.2 0.2 0.1 0 0 0.02 0 02 0.04 0 04 0.06 0 06 0.08 0 08 0.1 01 0 0 0.2 0.4 0.6 0.8 1 1 - Specificity
  • 20. Phosphorylation Site Prediction Software Implementation-Musite Open Source License: GNU General Public License (GPL) http://musite.sourceforge.net/ http://musite sourceforge net/ Stand-alone application Based on Java Support Windows Linux and Mac OS X Windows, Linux, A web server is also being developed g p http://musite.net/
  • 21. Implementation User Interface
  • 22. Implementation Customized Model Training A unique utility for users to train prediction models f di ti d l from th i own d t their data Take advantage of latest data Train disease-specific models Train organ-specific models Integrate into experimental p g p procedure in an iterative way
  • 23. Summary Musite is for prediction of general and kinase- specific phosphosites in a better accuracy Musite is a open-source standalone program capable of performing proteome-wide proteome wide predictions
  • 24. Acknowledgements Dr. Dong Xu (University of Missouri) Dr. Jay Thelen (U e s ty o Missouri) e e (University of ssou ) Dr. Keith Dunker (Indiana University) Curtis Bollinger (University of Missouri) Funding Visit us at NSF [# DBI 0604439] DBI-0604439] http://musite.sourceforge.net p g NIH [# R21/R33 GM078601] http://musite.net Poster R09 at ISMB