SintCompound: A Small Compound Database for Virtual Screening
Yasset perezriverol csi2011
1. In silico analysis of accurate
proteomics, complemented by
selective isolation of peptides.
Yasset Perez-Riverol
yperez@ebi.ac.uk
yasset.perez@biocomp.cigb.edu.cu
Aniel Sanchez Puentes
aniel.sanchez@cigb.edu.cu
EBI is an Outstation of the European Molecular Biology Laboratory.
2. ABSTRACT
Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy
of molecular mass determination. However, the high complexity and dynamic ranges for any
species of proteomic samples, surpasses the separation capacity and detection power of the
most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny
portion of signals is selected for MS/MS experiments and a still considerable number of them do
not provide reliable peptide identification.
The approach is based on mass accuracy, isoelectric point (pI), retention time (tR) and N-terminal
amino acid determination as protein identification criteria regardless of high quality MS/MS
spectra. When the methodology was combined with the selective isolation methods, the number
of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the
methodology, an OFFGEL-LC-MS/MS experiment was also implemented. Our results show that
using the information provided by these features and selective isolation methods we could found
the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than
5%.
3. Drosophila cell
RH0
~~~~ K
RH1
~~~~R
(0)
Abs at 215 nm
~~H~~K
RH2
~~H~~R
(1+)
(2+, 3+)
20
10
30
40
764.974
100
b
%
1
292.163
RPEGENASYHLAYDKDR389
373
764.320
719.299
0
827.868
100
b
%
1
251.126
0
b
100 110.025
1
292.138
%
0
y
n
828.373
DSSIVTHDNDIFR233
221
828.889
702.367
1
RPEGENASYHLAYDK389
373
1021.131 1238.504 1450.501 1637.761
1140.3881333.778 1577.739 1762.484
MS/MS spectra were interpreted by the X! Tandem
software using the Flybase sequence database. The
database search results were validated using
PeptideProphet.
This work analyzed only the four isoelectric
focusing fractions with the lowest pI having the best
agreement between the theoretical and experimental
values, according to previous. In addition, these
fractions cover 50% of the identified peptides. Also,
we used only highly reliable peptide identifications,
filtering out those with a PeptideProphet probability
lower than 0.97 (FDR = 0.01) or with
posttranslational modifications. For experimental t R
analysis the acceptance error was set at 748.42 s,
and mass tolerance was set at 10 ppm.
6. Create a Insilico tryptic
peptide database.
Annotate peptides with
theoretical rt, pI, mass,
MW, N-Term
Annotate experimental identified
sequences from PeptideProphet
output with probability
more than 0.97.
(rt, pI, N-term, MW, sequence)
Search precursor masses of
Experimental sequences on
Insilco Database.
[Match only one sequence]
[Not match any sequence]
Peptides out of
the ppm range
[Match with more than one sequence]
Search in the input theoretical
List of sequence the peptide
By current property
(pI, rt, MW, N-term)
[Not match any sequence]
[Match with more than one sequence]
Peptides out of
the error range for
property
[Match only one sequence]
Compare with MS/MS
sequence result
[Not Match Insilco sequence with MS/MS sequence]
Annotate as a False
Positive Identification.
[Match Insilco sequence with MS/MS sequence]
Annotate as Peptide
Identification
A tree-based algorithm to identify
unique
peptides
in
the
experimental set was constructed
in a similar fashion to the one
designed for theoretical analysis.
The final list of unique peptides
was validated by using the
sequence
predicted
from
PeptideProphet. In cases where
the PeptideProphet sequences
and the sequences identified by
our approach did not match, the
identifications achieved by our
algorithm were considered as false
positive identification.
7.
8. CONCLUSION
The use of the information provided by some analytical tools could help to offset the
information contained in the sequence of peptides, but it is more efficient when a
prokaryote proteome is analyzed.
Some drawbacks associated to precision (accuracy) that can be predicted are that
the variables used may hinder the accurate mass proteomics analysis with the
identification of false positive hints. The inclusion of some types of peptides and the
reduction of complexity allows increasing the percent of unique peptides compared to
normal analysis. The combination of several selective methods (RH0, RH1, and
RH2) in the same sample could increase the percent of proteins with unique
peptides. The theoretical analysis described in this paper does not exclude the
possibility of combining it with the MS/MS information obtained in any proteomic
experiment.