SlideShare uma empresa Scribd logo
1 de 39
Exploring the potential of public
proteomics data
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Datasets are being reused more and more….
Vaudel et al., Proteomics, 2016
Data download volume for
PRIDE Archive in 2015: 198 TB
0
50
100
150
200
250
2013 2014 2015 2016
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
• Data as they are.
• Protein knowledge bases: UniProt, neXtProt.
• Contributing to the Protein Evidence Code.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Protein Evidence codes in UniProt/neXtProt
http://www.uniprot.org/help/protein_existence
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Use of MS data in UniProt
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Use of MS data in neXtProt
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Reuse
• Information is not only extracted, but reused in new
experiments with the potential of generating new
knowledge.
• Transitions used in SRM approaches.
• Meta-analysis approaches.
• Spectral libraries.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
SRMAtlas
http://www.srmatlas.org/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PeptidePicker
http://mrmpeptidepicker.proteincentre.com/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Meta-analysis approaches
• Putting data coming from a lot of experiments
together, to extract new knowledge. Examples:
• Study the cleavage mechanism and performance of
trypsin.
• Fragmentation patterns.
• Retention time prediction.
• Which is the most suitable reference DB for long-term
proteomics data storage?
• Data integration of experiments done at different time
points.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Spectral searching
• Concept: To compare experimental spectra to other
experimental spectra.
• There are many spectral libraries publicly available (for
instance, from NIST, PeptideAtlas and PRIDE)
• Custom ‘search engines’ have been developed:
• SpectraST (TPP)
• X!Hunter (GPM)
• Bibliospec
• It has been claimed that the searches have more
sensitivity that with sequence database approaches
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Spectral searching (2)
http://peptide.nist.gov/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PRIDE Cluster as a Public Data Mining Resource
17
• http://www.ebi.ac.uk/pride/cluster
• Spectral libraries for 16 species.
• All clustering results, as well as specific subsets of interest available.
• Source code (open source) and Java API
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Reprocess
• Data are reprocessed with the intention of obtaining
new knowledge or to provide an updated view on the
results.
• It mainly serves the same purpose of the original
experiment.
• For instance, a shot-gun dataset can be reprocessed
with a different algorithm or an updated sequence
database.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Reprocessing repositories
• These resources collect MS raw data and reprocess it using
one given analysis pipeline, and an up-to date protein
sequence database.
• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
PeptideAtlas and GPMDB
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014
•Around 60% of the data used for the
analysis comes from previous
experiments, most of them stored in
proteomics repositories such as
PRIDE/ProteomeXchange, PASSEL or
MassIVE.
•They complement that data with “exotic”
tissues.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Reprocessing for the validation of controversial data
• Analysis of Tyrannosaurus rex fossils: controversial presence of
collagen (is it a contamination of the sample? Did the sample contain
any T. rex proteins at all?)
Asara et al. (2007) Science 316: 280-5.
Asara et al. (2007) Science 316: 1324-5.
Bern et al. (2009) JPR 9: 4328-32
PRIDE Archive assay accession
8633
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Info from R. Chalkley
Bromenshenk et al. (2011) PLOS One 5: e13181
Reprocessing for the validation of controversial data (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Experimental Protocol
1. Collected samples from healthy, collapsing and collapsed bee colonies.
2. Homogenised bees.
3. Digested with Trypsin
4. Analyzed by LC-MSMS on LTQ
5. Searched using Sequest
6. Filtered Results using Peptide and Protein Prophet
7. Performed further analysis to determine species statistically more
commonly found in collapsing/collapsed colony samples
Info from R. Chalkley
Bromenshenk et al. (2011) PLOS One 5: e13181
Reprocessing for the validation of controversial data (3)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
• Big pitfall: Search database was only composed by viral
proteins. Not bee proteins at all!!
• After researching the data, there is no evidence for viral
peptides/proteins in any of their data: honey bee, fruit fly,
wasp, moth, human keratin, bacteria that like sugary
environments, …
• “We believe that there is currently insufficient evidence to
conclude that bees are a natural host for IIV-6, let alone that
the virus is linked to CCD”.
Info from R. Chalkley
Knudsen & Chalkley (2011) PLOS One 6:
e20873
Foster (2011), MCP 10: M110.006387
Reprocessing for the validation of controversial data (4)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Reprocessing for the validation of controversial data
Datasets PXD000561 and PXD000865 in PRIDE Archive
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Various reanalysis of these datasets have been performed…
Reanalysis of Pandey dataset (Nature, 2014) made by J. Choudhary’s group at
Sanger Institute
Wright et al., Nat Commun, 2016Dataset PXD000561
http://www.ebi.ac.uk/gxa
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Repurposing
• Data are considered in light of a question or a context
that is different from the original study.
• Proteogenomics studies
• Discovery of novel PTMs.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Examples of repurposing datasets: proteogenomics
Data in public resources can be used for genome annotation purposes
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Repurposing: new PTMs found
• Individual authors can reprocess raw data with new
hypotheses in mind (not taken into account by the original
authors).
• Recent examples (using phosphoproteomics data sets):
• O-GlcNAc-6-phosphate1
• Phosphoglyceryl2
• ADP-ribosylation3
1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9
2Moellering & Cravatt, Science (2013) 341 549-553
3Matic et al., Nat Methods (2012) 9 771-2
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Vaudel M, Barsnes H, Berven FS, Sickmann A,
Martens L:
Proteomics 2011;11(5):996-9.
https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,
Barsnes H:
Nature Biotechnology 2015; 33(1):22-4.
CompOmics Open Source Analysis Pipeline
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Find the desired PRIDE project …
… and start re-analyzing the data!
… inspect the project details ….
Reshake PRIDE data!
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Public datasets from different omics: OmicsDI
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., Nat Biotechnol, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Acknowledgements
http://www.ncbi.nlm.nih.gov/pubmed/26449181
http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 8 December 2016
Questions?

Mais conteúdo relacionado

Mais procurados

Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 

Mais procurados (20)

PRIDE-ProteomeXchange
PRIDE-ProteomeXchangePRIDE-ProteomeXchange
PRIDE-ProteomeXchange
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11
 
Festival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaFestival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: Agenda
 
Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
Data Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer GenomeData Visualization to Enhance our Understanding of the Cancer Genome
Data Visualization to Enhance our Understanding of the Cancer Genome
 
B.3.5
B.3.5B.3.5
B.3.5
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 

Destaque (14)

Gritos y susurros
Gritos y susurrosGritos y susurros
Gritos y susurros
 
Plan impulsa
Plan impulsaPlan impulsa
Plan impulsa
 
Dogsface
DogsfaceDogsface
Dogsface
 
女の人生と起業
女の人生と起業女の人生と起業
女の人生と起業
 
Nanci Lynn Gibson
Nanci Lynn GibsonNanci Lynn Gibson
Nanci Lynn Gibson
 
Pymes
PymesPymes
Pymes
 
Finished booklet and infographics
Finished booklet and infographicsFinished booklet and infographics
Finished booklet and infographics
 
Corrosion
CorrosionCorrosion
Corrosion
 
El cielo
El cieloEl cielo
El cielo
 
TheVitalTradingLink (1)
TheVitalTradingLink (1)TheVitalTradingLink (1)
TheVitalTradingLink (1)
 
Externailidad
ExternailidadExternailidad
Externailidad
 
Certificates of Achievement
Certificates of AchievementCertificates of Achievement
Certificates of Achievement
 
Presentaciondelaempresa
PresentaciondelaempresaPresentaciondelaempresa
Presentaciondelaempresa
 
Why Cloud Workforce Optimization?
Why Cloud Workforce Optimization?Why Cloud Workforce Optimization?
Why Cloud Workforce Optimization?
 

Semelhante a Reuse of public data in proteomics

Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Matthieu Schapranow
 

Semelhante a Reuse of public data in proteomics (20)

Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the Proteomics Bioinformatics Course 2016
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...An overview of the PRIDE ecosystem of resources and computational tools for m...
An overview of the PRIDE ecosystem of resources and computational tools for m...
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Pride and ProteomeXchange
Pride and ProteomeXchangePride and ProteomeXchange
Pride and ProteomeXchange
 
Mining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasetsMining the hidden proteome using hundreds of public proteomics datasets
Mining the hidden proteome using hundreds of public proteomics datasets
 
Pride cluster presentation
Pride cluster presentation Pride cluster presentation
Pride cluster presentation
 
Big Data in Life Sciences
Big Data in Life SciencesBig Data in Life Sciences
Big Data in Life Sciences
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Proteomexchange
ProteomexchangeProteomexchange
Proteomexchange
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
ProteomeXchange update HUPO 2016
ProteomeXchange update HUPO 2016ProteomeXchange update HUPO 2016
ProteomeXchange update HUPO 2016
 

Mais de Juan Antonio Vizcaino

Mais de Juan Antonio Vizcaino (16)

Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics dataPRIDE and ProteomeXchange: A golden age for working with public proteomics data
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
 
ProteomeXchange update 2017
ProteomeXchange update 2017ProteomeXchange update 2017
ProteomeXchange update 2017
 
Enabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics dataEnabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics data
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
 
The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)The Proteomics Standards Initiative (PSI)
The Proteomics Standards Initiative (PSI)
 

Último

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Último (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 

Reuse of public data in proteomics

  • 1. Exploring the potential of public proteomics data Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Datasets are being reused more and more…. Vaudel et al., Proteomics, 2016 Data download volume for PRIDE Archive in 2015: 198 TB 0 50 100 150 200 250 2013 2014 2015 2016 Downloads in TBs
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics • Data as they are. • Protein knowledge bases: UniProt, neXtProt. • Contributing to the Protein Evidence Code.
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Protein Evidence codes in UniProt/neXtProt http://www.uniprot.org/help/protein_existence
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Use of MS data in UniProt
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Use of MS data in neXtProt
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Reuse • Information is not only extracted, but reused in new experiments with the potential of generating new knowledge. • Transitions used in SRM approaches. • Meta-analysis approaches. • Spectral libraries.
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 SRMAtlas http://www.srmatlas.org/
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 PeptidePicker http://mrmpeptidepicker.proteincentre.com/
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Meta-analysis approaches • Putting data coming from a lot of experiments together, to extract new knowledge. Examples: • Study the cleavage mechanism and performance of trypsin. • Fragmentation patterns. • Retention time prediction. • Which is the most suitable reference DB for long-term proteomics data storage? • Data integration of experiments done at different time points.
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Spectral searching • Concept: To compare experimental spectra to other experimental spectra. • There are many spectral libraries publicly available (for instance, from NIST, PeptideAtlas and PRIDE) • Custom ‘search engines’ have been developed: • SpectraST (TPP) • X!Hunter (GPM) • Bibliospec • It has been claimed that the searches have more sensitivity that with sequence database approaches
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Spectral searching (2) http://peptide.nist.gov/
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 PRIDE Cluster as a Public Data Mining Resource 17 • http://www.ebi.ac.uk/pride/cluster • Spectral libraries for 16 species. • All clustering results, as well as specific subsets of interest available. • Source code (open source) and Java API
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Reprocess • Data are reprocessed with the intention of obtaining new knowledge or to provide an updated view on the results. • It mainly serves the same purpose of the original experiment. • For instance, a shot-gun dataset can be reprocessed with a different algorithm or an updated sequence database.
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Reprocessing repositories • These resources collect MS raw data and reprocess it using one given analysis pipeline, and an up-to date protein sequence database. • Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 PeptideAtlas and GPMDB
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. •They complement that data with “exotic” tissues.
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Reprocessing for the validation of controversial data • Analysis of Tyrannosaurus rex fossils: controversial presence of collagen (is it a contamination of the sample? Did the sample contain any T. rex proteins at all?) Asara et al. (2007) Science 316: 280-5. Asara et al. (2007) Science 316: 1324-5. Bern et al. (2009) JPR 9: 4328-32 PRIDE Archive assay accession 8633
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Info from R. Chalkley Bromenshenk et al. (2011) PLOS One 5: e13181 Reprocessing for the validation of controversial data (2)
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Experimental Protocol 1. Collected samples from healthy, collapsing and collapsed bee colonies. 2. Homogenised bees. 3. Digested with Trypsin 4. Analyzed by LC-MSMS on LTQ 5. Searched using Sequest 6. Filtered Results using Peptide and Protein Prophet 7. Performed further analysis to determine species statistically more commonly found in collapsing/collapsed colony samples Info from R. Chalkley Bromenshenk et al. (2011) PLOS One 5: e13181 Reprocessing for the validation of controversial data (3)
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 • Big pitfall: Search database was only composed by viral proteins. Not bee proteins at all!! • After researching the data, there is no evidence for viral peptides/proteins in any of their data: honey bee, fruit fly, wasp, moth, human keratin, bacteria that like sugary environments, … • “We believe that there is currently insufficient evidence to conclude that bees are a natural host for IIV-6, let alone that the virus is linked to CCD”. Info from R. Chalkley Knudsen & Chalkley (2011) PLOS One 6: e20873 Foster (2011), MCP 10: M110.006387 Reprocessing for the validation of controversial data (4)
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Reprocessing for the validation of controversial data Datasets PXD000561 and PXD000865 in PRIDE Archive
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Various reanalysis of these datasets have been performed… Reanalysis of Pandey dataset (Nature, 2014) made by J. Choudhary’s group at Sanger Institute Wright et al., Nat Commun, 2016Dataset PXD000561 http://www.ebi.ac.uk/gxa
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Data sharing in Proteomics
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Repurposing • Data are considered in light of a question or a context that is different from the original study. • Proteogenomics studies • Discovery of novel PTMs.
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Examples of repurposing datasets: proteogenomics Data in public resources can be used for genome annotation purposes
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Repurposing: new PTMs found • Individual authors can reprocess raw data with new hypotheses in mind (not taken into account by the original authors). • Recent examples (using phosphoproteomics data sets): • O-GlcNAc-6-phosphate1 • Phosphoglyceryl2 • ADP-ribosylation3 1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9 2Moellering & Cravatt, Science (2013) 341 549-553 3Matic et al., Nat Methods (2012) 9 771-2
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9. https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H: Nature Biotechnology 2015; 33(1):22-4. CompOmics Open Source Analysis Pipeline
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Find the desired PRIDE project … … and start re-analyzing the data! … inspect the project details …. Reshake PRIDE data!
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Public datasets from different omics: OmicsDI http://www.ebi.ac.uk/Tools/omicsdi/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA Perez-Riverol et al., Nat Biotechnol, in press
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 OmicsDI: Portal for omics datasets
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 OmicsDI: Portal for omics datasets
  • 38. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Acknowledgements http://www.ncbi.nlm.nih.gov/pubmed/26449181 http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf
  • 39. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2016 Hinxton, 8 December 2016 Questions?