This is the fifth presentation of the BITS training on 'Mass spec data processing'.
It reviews the problems of determining protein sequences of mass spec data, how to deal with it, with an overview of useful tools.
Thanks to the Compomics Lab of the VIB for their contribution.
2. peptide validation
and protein inference
kenny helsens
kenny.helsens@ugent.be
Lennart MARTENS
lennart.martens@ebi.ac.uk
Computational Omics and Systems Biology Group
Proteomics Services Group
European Bioinformatics Institute
Department of Medical Protein Research, VIB
Hinxton, Cambridge
United Kingdom
Department of Biochemistry, Ghent University
www.ebi.ac.uk
Kenny Helsens Ghent, Belgium
BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
3. Data processing and information ambiguity
Raw data
Peaklists
Peptide sequences
Protein accession numbers
ambiguity data size
See: Martens and Hermjakob, Molecular BioSystems, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
5. Populations and individuals
10,000 peptide-to-spectrum
matches
5%
decoy hits
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
6. Eliminating false positives
Suspect peptide identifications happen.
The problem is that finding them requires
detailed analysis of a single spectrum and
its identifications, amongst thousands of
other spectra…
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
7. Automated interpretation
The Netherlands??
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
8. Manual interpretation
Tyrosine phosporylation
See: Ghesquière and Helsens, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
9. Peptizer expert system
Agent c
Agent b Agent d
Agent a Agent e
Vote casts +1 +1 0 -1 +1
Aggregation of the votes
Confident Peptide Identifications Suspicious Trusted
subset subset
See: Helsens et al, Molecular and Cellular Proteomics, 2008
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
10. Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
11. Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
12. PROTEIN INFERENCE
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
13. Not all peptides are created equal
Gene 1a 1b 2 3 4 5 6a 6b
Transcripts 1a 1b 2 5 6a 6b
1a 1b 2 3 5 6a 6b
1b 2 3 4 5 6a 6b
1a 1b 2 3 4 5 6a
Translations 2 5
2 3 5
Peptides 2 3 4 5
matching all transcripts 2 3 4 5 redundant
matching a transcript subset
matching exactly 1 translation
Intron Exon UTR Exon CDS Peptide
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
14. Sample preparation consequences
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
15. Sample preparation consequences
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
16. Protein inference: a question of conviction
peptides a b c d
proteins
prot X x x
Minimal set
Occam { prot Y
prot Z
x
x x x
peptides a b c d
proteins
prot X x x
Maximal set
anti-Occam { prot Y
prot Z
x
x x x
peptides a b c d
proteins
prot X (-) x x
Minimal set with
maximal annotation { prot Y (+)
prot Z (0)
x
x x x
true Occam?
See: Martens and Hermjakob, Molecular BioSystems, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
17. ALGORITHMS FOR THE
PROTEIN INFERENCE PROBLEM
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
18. A few algorithms for protein inference
• IDPicker
Zhang et al, Journal of Proteome Research, 2007
• ProteinProphet
Nesvizhskii AI et al, Analytical Chemistry, 2003
• DBToolkit
Martens et al, Bioinformatics, 2005
http://genesis.UGent.be/dbtoolkit
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
19. IDPicker parsimonious protein assembly
(I) Initialize
See: Zhang et al, Journal of Proteome Research, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
20. IDPicker parsimonious protein assembly
(II) Collapse
See: Zhang et al, Journal of Proteome Research, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
21. IDPicker parsimonious protein assembly
(III) Separate
See: Zhang et al, Journal of Proteome Research, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
22. IDPicker parsimonious protein assembly
(IV) Reduce
See: Zhang et al, Journal of Proteome Research, 2007
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
23. ProteinProphet: the simplified view
peptide peptide protein
probability weight probability
peptide probability
In iteration 1, all weights w start off as 1/n,
with n the degeneracy count for the peptide
See: Nesvizhskii AI et al., Analytical Chemistry, 2003
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
24. DBToolkit protein inference
peptides a b cd
proteins
prot X (-) x x
Minimal set with
maximal annotation { prot Y (+)
prot Z (0)
x
x x x
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
25. Some indications from the HUPO BPP
peptides a b c d
proteins
prot X (-) x x
prot Y (+) x
prot Z (0) x x x
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
26. PROTEIN INFERENCE AND
QUANTIFICATION
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
27. Some inference examples (i)
http://genesis.ugent.be/rover/
Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution
See: Colaert et al, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
28. Some inference examples (ii)
http://genesis.ugent.be/rover/
Nice and easy, down-regulated
See: Colaert et al, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
29. Some inference examples (iii)
http://genesis.ugent.be/rover/
A little less easy, up-regulated
See: Colaert et al, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
30. Some inference examples (iv)
http://genesis.ugent.be/rover/
A nice example of the mess of degenerate peptides
See: Colaert et al, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
31. Some inference examples (v)
http://genesis.ugent.be/rover/
A bit of chaos, but a defined core distribution
See: Colaert et al, Proteomics, 2010
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011
32. Thank you!
Questions?
Kenny Helsens BITS MS Data Processing – Protein Inference
kenny.helsens@UGent.be UGent, Gent, Belgium – 16 December 2011