SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
http://www.bits.vib.be/training
peptide validation
             and protein inference


                                                   kenny helsens
                                             kenny.helsens@ugent.be
                             Lennart MARTENS
                           lennart.martens@ebi.ac.uk
                            Computational Omics and Systems Biology Group
                                Proteomics Services Group
                             European Bioinformatics Institute
                                Department of Medical Protein Research, VIB
                                   Hinxton, Cambridge
                                      United Kingdom
                                Department of Biochemistry, Ghent University
                                      www.ebi.ac.uk
Kenny Helsens                                       Ghent, Belgium
                         BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be    UGent, Gent, Belgium – 16 December 2011
Data processing and information ambiguity

                                                Raw data



                                                Peaklists



                                         Peptide sequences



                                   Protein accession numbers
            ambiguity                                                             data size



                         See: Martens and Hermjakob, Molecular BioSystems, 2007

Kenny Helsens                  BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be            UGent, Gent, Belgium – 16 December 2011
PEPTIDE IDENTIFICATION VALIDATION




Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Populations and individuals




                         10,000 peptide-to-spectrum
                         matches




                                                                            5%
                                                                            decoy hits




Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
Eliminating false positives


            Suspect peptide identifications happen.

            The problem is that finding them requires
            detailed analysis of a single spectrum and
            its identifications, amongst thousands of
                           other spectra…




Kenny Helsens              BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be       UGent, Gent, Belgium – 16 December 2011
Automated interpretation




                                                                 The Netherlands??




Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Manual interpretation

                         Tyrosine phosporylation




                          See: Ghesquière and Helsens, Proteomics, 2010

Kenny Helsens             BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be      UGent, Gent, Belgium – 16 December 2011
Peptizer expert system

                                                           Agent c


                                      Agent b                                  Agent d


                          Agent a                                                            Agent e

         Vote casts                     +1           +1        0         -1         +1




                                                Aggregation of the votes




               Confident Peptide Identifications                      Suspicious         Trusted
                                                                      subset             subset


                                 See: Helsens et al, Molecular and Cellular Proteomics, 2008

Kenny Helsens                            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be                       UGent, Gent, Belgium – 16 December 2011
Peptizer expert system




                         See: Helsens et al, Molecular and Cellular Proteomics, 2008

Kenny Helsens                  BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be           UGent, Gent, Belgium – 16 December 2011
Peptizer expert system




                         See: Helsens et al, Molecular and Cellular Proteomics, 2008

Kenny Helsens                  BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be           UGent, Gent, Belgium – 16 December 2011
PROTEIN INFERENCE




Kenny Helsens              BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be       UGent, Gent, Belgium – 16 December 2011
Not all peptides are created equal

    Gene                                           1a      1b 2                  3         4   5   6a   6b




    Transcripts                                    1a      1b 2                                5   6a   6b

                                                   1a      1b 2                  3             5   6a   6b

                                                           1b 2                  3         4   5   6a   6b

                                                   1a      1b 2                  3         4   5   6a




    Translations                                              2 5

                                                              2      3       5

    Peptides                                                  2      3       4 5


        matching all transcripts                              2      3       4 5      redundant
        matching a transcript subset
        matching exactly 1 translation
                            Intron        Exon UTR           Exon CDS                Peptide




Kenny Helsens                  BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be           UGent, Gent, Belgium – 16 December 2011
Sample preparation consequences




                         See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005

Kenny Helsens                     BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be              UGent, Gent, Belgium – 16 December 2011
Sample preparation consequences




                         See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005

Kenny Helsens                     BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be              UGent, Gent, Belgium – 16 December 2011
Protein inference: a question of conviction
                                          peptides                a         b           c           d
                                          proteins
                                            prot X                x                     x
               Minimal set
                 Occam       {              prot Y
                                            prot Z
                                                                  x
                                                                            x           x           x

                                          peptides                a          b          c           d
                                          proteins
                                            prot X                x                     x
               Maximal set
               anti-Occam    {              prot Y
                                            prot Z
                                                                  x
                                                                             x          x           x

                                          peptides                 a         b           c           d
                                          proteins
                                            prot X (-)             x                     x
    Minimal set with
    maximal annotation       {              prot Y (+)
                                            prot Z (0)
                                                                   x
                                                                             x           x           x
        true Occam?
                                                 See: Martens and Hermjakob, Molecular BioSystems, 2007

Kenny Helsens                BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be           UGent, Gent, Belgium – 16 December 2011
ALGORITHMS FOR THE
              PROTEIN INFERENCE PROBLEM




Kenny Helsens              BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be       UGent, Gent, Belgium – 16 December 2011
A few algorithms for protein inference


               • IDPicker
                         Zhang et al, Journal of Proteome Research, 2007


               • ProteinProphet
                         Nesvizhskii AI et al, Analytical Chemistry, 2003


               • DBToolkit
                         Martens et al, Bioinformatics, 2005
                         http://genesis.UGent.be/dbtoolkit




Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
IDPicker parsimonious protein assembly

   (I) Initialize




                         See: Zhang et al, Journal of Proteome Research, 2007

Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
IDPicker parsimonious protein assembly

   (II) Collapse




                         See: Zhang et al, Journal of Proteome Research, 2007

Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
IDPicker parsimonious protein assembly

   (III) Separate




                         See: Zhang et al, Journal of Proteome Research, 2007

Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
IDPicker parsimonious protein assembly

   (IV) Reduce




                         See: Zhang et al, Journal of Proteome Research, 2007

Kenny Helsens                 BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be          UGent, Gent, Belgium – 16 December 2011
ProteinProphet: the simplified view




                              peptide         peptide                   protein
                             probability      weight                  probability




                              peptide probability




  In iteration 1, all weights w start off as 1/n,
  with n the degeneracy count for the peptide
                         See: Nesvizhskii AI et al., Analytical Chemistry, 2003

Kenny Helsens                  BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be           UGent, Gent, Belgium – 16 December 2011
DBToolkit protein inference
                                   peptides                            a   b   cd
                                                            proteins
                                        prot X (-)           x             x
    Minimal set with
   maximal annotation    {              prot Y (+)
                                        prot Z (0)
                                                             x
                                                                       x   x   x




Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Some indications from the HUPO BPP




                                       peptides                a       b   c   d
                                       proteins
                                         prot X (-)            x           x
                                         prot Y (+)            x
                                         prot Z (0)                    x   x   x




Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
PROTEIN INFERENCE AND
                         QUANTIFICATION




Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Some inference examples (i)
                                                         http://genesis.ugent.be/rover/




    Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution
                              See: Colaert et al, Proteomics, 2010

Kenny Helsens             BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be      UGent, Gent, Belgium – 16 December 2011
Some inference examples (ii)
                                                        http://genesis.ugent.be/rover/




                         Nice and easy, down-regulated
                             See: Colaert et al, Proteomics, 2010

Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Some inference examples (iii)
                                                        http://genesis.ugent.be/rover/




                         A little less easy, up-regulated
                             See: Colaert et al, Proteomics, 2010

Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011
Some inference examples (iv)
                                                           http://genesis.ugent.be/rover/




                   A nice example of the mess of degenerate peptides
                                See: Colaert et al, Proteomics, 2010

Kenny Helsens               BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be        UGent, Gent, Belgium – 16 December 2011
Some inference examples (v)
                                                               http://genesis.ugent.be/rover/




                         A bit of chaos, but a defined core distribution
                                    See: Colaert et al, Proteomics, 2010

Kenny Helsens                   BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be            UGent, Gent, Belgium – 16 December 2011
Thank you!
                Questions?
Kenny Helsens            BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be     UGent, Gent, Belgium – 16 December 2011

Mais conteúdo relacionado

Destaque

BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingLokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingJonas Lagander
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Jonas Lagander
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...BITS
 
Besök kimstad rapport förstudie
Besök kimstad   rapport förstudieBesök kimstad   rapport förstudie
Besök kimstad rapport förstudieJonas Lagander
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics courseBITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS: Introduction to linux, distributions and installation
BITS: Introduction to linux, distributions and installationBITS: Introduction to linux, distributions and installation
BITS: Introduction to linux, distributions and installationBITS
 
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...Pasteur_Tunis
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 

Destaque (20)

BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utvecklingLokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
Lokala banksystem utan vinstkrav - för tillväxt och hållbar utveckling
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015Projekt sociala ekonomin i motala - slutrapport 2015
Projekt sociala ekonomin i motala - slutrapport 2015
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 
Besök kimstad rapport förstudie
Besök kimstad   rapport förstudieBesök kimstad   rapport förstudie
Besök kimstad rapport förstudie
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Elis as 1
Elis as 1Elis as 1
Elis as 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS: Introduction to linux, distributions and installation
BITS: Introduction to linux, distributions and installationBITS: Introduction to linux, distributions and installation
BITS: Introduction to linux, distributions and installation
 
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...
Functional analysis of proteomic biomarkers and targeting glioblastoma stem c...
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Mais de BITS

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl courseBITS
 
Basics statistics
Basics statistics Basics statistics
Basics statistics BITS
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 

Mais de BITS (11)

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

BITS - Protein inference from mass spectrometry data

  • 2. peptide validation and protein inference kenny helsens kenny.helsens@ugent.be Lennart MARTENS lennart.martens@ebi.ac.uk Computational Omics and Systems Biology Group Proteomics Services Group European Bioinformatics Institute Department of Medical Protein Research, VIB Hinxton, Cambridge United Kingdom Department of Biochemistry, Ghent University www.ebi.ac.uk Kenny Helsens Ghent, Belgium BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 3. Data processing and information ambiguity Raw data Peaklists Peptide sequences Protein accession numbers ambiguity data size See: Martens and Hermjakob, Molecular BioSystems, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 4. PEPTIDE IDENTIFICATION VALIDATION Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 5. Populations and individuals 10,000 peptide-to-spectrum matches 5% decoy hits Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 6. Eliminating false positives Suspect peptide identifications happen. The problem is that finding them requires detailed analysis of a single spectrum and its identifications, amongst thousands of other spectra… Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 7. Automated interpretation The Netherlands?? Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 8. Manual interpretation Tyrosine phosporylation See: Ghesquière and Helsens, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 9. Peptizer expert system Agent c Agent b Agent d Agent a Agent e Vote casts +1 +1 0 -1 +1 Aggregation of the votes Confident Peptide Identifications Suspicious Trusted subset subset See: Helsens et al, Molecular and Cellular Proteomics, 2008 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 10. Peptizer expert system See: Helsens et al, Molecular and Cellular Proteomics, 2008 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 11. Peptizer expert system See: Helsens et al, Molecular and Cellular Proteomics, 2008 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 12. PROTEIN INFERENCE Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 13. Not all peptides are created equal Gene 1a 1b 2 3 4 5 6a 6b Transcripts 1a 1b 2 5 6a 6b 1a 1b 2 3 5 6a 6b 1b 2 3 4 5 6a 6b 1a 1b 2 3 4 5 6a Translations 2 5 2 3 5 Peptides 2 3 4 5 matching all transcripts 2 3 4 5 redundant matching a transcript subset matching exactly 1 translation Intron Exon UTR Exon CDS Peptide Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 14. Sample preparation consequences See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 15. Sample preparation consequences See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 16. Protein inference: a question of conviction peptides a b c d proteins prot X x x Minimal set Occam { prot Y prot Z x x x x peptides a b c d proteins prot X x x Maximal set anti-Occam { prot Y prot Z x x x x peptides a b c d proteins prot X (-) x x Minimal set with maximal annotation { prot Y (+) prot Z (0) x x x x true Occam? See: Martens and Hermjakob, Molecular BioSystems, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 17. ALGORITHMS FOR THE PROTEIN INFERENCE PROBLEM Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 18. A few algorithms for protein inference • IDPicker Zhang et al, Journal of Proteome Research, 2007 • ProteinProphet Nesvizhskii AI et al, Analytical Chemistry, 2003 • DBToolkit Martens et al, Bioinformatics, 2005 http://genesis.UGent.be/dbtoolkit Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 19. IDPicker parsimonious protein assembly (I) Initialize See: Zhang et al, Journal of Proteome Research, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 20. IDPicker parsimonious protein assembly (II) Collapse See: Zhang et al, Journal of Proteome Research, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 21. IDPicker parsimonious protein assembly (III) Separate See: Zhang et al, Journal of Proteome Research, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 22. IDPicker parsimonious protein assembly (IV) Reduce See: Zhang et al, Journal of Proteome Research, 2007 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 23. ProteinProphet: the simplified view peptide peptide protein probability weight probability peptide probability In iteration 1, all weights w start off as 1/n, with n the degeneracy count for the peptide See: Nesvizhskii AI et al., Analytical Chemistry, 2003 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 24. DBToolkit protein inference peptides a b cd proteins prot X (-) x x Minimal set with maximal annotation { prot Y (+) prot Z (0) x x x x Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 25. Some indications from the HUPO BPP peptides a b c d proteins prot X (-) x x prot Y (+) x prot Z (0) x x x Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 26. PROTEIN INFERENCE AND QUANTIFICATION Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 27. Some inference examples (i) http://genesis.ugent.be/rover/ Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution See: Colaert et al, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 28. Some inference examples (ii) http://genesis.ugent.be/rover/ Nice and easy, down-regulated See: Colaert et al, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 29. Some inference examples (iii) http://genesis.ugent.be/rover/ A little less easy, up-regulated See: Colaert et al, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 30. Some inference examples (iv) http://genesis.ugent.be/rover/ A nice example of the mess of degenerate peptides See: Colaert et al, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 31. Some inference examples (v) http://genesis.ugent.be/rover/ A bit of chaos, but a defined core distribution See: Colaert et al, Proteomics, 2010 Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
  • 32. Thank you! Questions? Kenny Helsens BITS MS Data Processing – Protein Inference kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011