SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
ELIXIR Implementation Study: “Mining the
Proteome: Enabling Automated Processing and
Analysis of Large-Scale Proteomics Data”
Juan AntonioVizcaíno
Mathias Walzer
European Bioinformatics Institute (EMBL-EBI)
juan@ebi.ac.uk, walzer@ebi.ac.uk
ELIXIR Webinar
11 April 2018
• One slide intro to proteomics
• The ELIXIR Proteomics Community
• The implementation study
• Plans for the near future
Outline
ELIXIR Webinar
11 April 2018
One slide intro to Mass Spectrometry
proteomics
Hein et al., Handbook of Systems Biology, 2012
Proteins ≈ most drug targets
ELIXIR Webinar
11 April 2018
• One slide intro to proteomics
• The ELIXIR Proteomics Community
• The implementation study
• Plans for the near future
Outline
ELIXIR Webinar
11 April 2018
• The goal of the ELIXIR proteomics community is to
develop and maintain sustainable proteomics
tools and data resources
• An essential part of the development will also be the
‘FAIRification’ of the resources (i.e. making the
resources FAIR)
• Integrate proteomics bioinformatics activities in
ELIXIR
Overall objectives
ELIXIR Webinar
11 April 2018
• 11 ELIXIR nodes supported the application:
• Germany (co-lead) (O. Kohlbacher)
• Belgium (co-lead) (L. Martens)
• Czech Republic
• Denmark
• Ireland
• France
• Netherlands
• Spain
• Sweden
• United Kingdom
• EMBL-EBI (co-lead) (Juan A. Vizcaíno)
ELIXIR nodes supporting the new Community
ELIXIR Webinar
11 April 2018
White paper as the basis for this Community
Vizcaíno et al., F1000Research, 2017
ELIXIR Webinar
11 April 2018
Highlighting already existing resources and initiatives
Tools: Services and connectors to drive access and exploitation
Data: Sustaining Europe’s life science data infrastructure
Interoperability: Integration of data and services
Compute: Access, exchange and storage
Training: Professional skills for managing and exploiting data
ELIXIR Webinar
11 April 2018
Tools: Services and connectors to drive access and exploitation
Data: Sustaining Europe’s life science data infrastructure
Interoperability: Integration of data and services
Compute: Access, exchange and storage
Training: Professional skills for managing and exploiting data
Highlighting already existing resources and initiatives
ELIXIR Webinar
11 April 2018
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
• Any type of data can be stored
• Leading ProteomeXchange
• From July 2017, an ELIXIR core resource
European leadership: the world-leading PRIDE database
http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
ELIXIR Webinar
11 April 2018
ProteomeXchange: A Global, distributed proteomics database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory data deposition
http://www.proteomexchange.org
Vizcaíno et al., Nat Biotechnol, 2014
Deutsch et al., NAR, 2017
iProX
(MS/MS data)
• Framework to allow standard data submission and dissemination
pipelines between the main existing proteomics repositories.
ELIXIR Webinar
11 April 2018
PRIDE data submissions and data growth
> 2,400 datasets submitted in 2017
September, November and December
2017 were the record months in terms
of submitted datasets
Datasets submitted per
month
Datasets submitted
per year
ELIXIR Webinar
11 April 2018
Stats: Data growth in EMBL-EBI resources
Sequence data
Micro-array
Metabolomics
Proteomics
ELIXIR Webinar
11 April 2018
Data re-use in proteomics is increasing
Data download volume for PRIDE
Archive in 2017: 295 TB
0
50
100
150
200
250
300
350
2013 2014 2015 2016 2017
Downloads in TBs
ELIXIR Webinar
11 April 2018
• One slide intro to proteomics
• The ELIXIR Proteomics Community
• The implementation study
• Plans for the near future
Outline
ELIXIR Webinar
11 April 2018
Tools: Services and connectors to drive access and exploitation
Data: Sustaining Europe’s life science data infrastructure
Interoperability: Integration of data and services
Compute: Access, exchange and storage
Training: Professional skills for managing and exploiting data
ELIXIR Platforms
ELIXIR Webinar
11 April 2018
• Title: ‘’Mining the proteome: Enabling automated processing and analysis
of large-scale proteomics data”.
• Development of open, reproducible, and robust analysis pipelines for
DDA (Data Dependent Acquisition) approaches.
• Deployment in the EMBL-”Embassy Cloud” (and optionally later other
clouds)
• Connected to PRIDE, bringing analysis closer to the data.
• Who is involved?
• EMBL-EBI (Vizcaíno & Newhouse).
• ELIXIR-DE (Kohlbacher, EKUT, Eisenacher, RUB)
ELIXIR Implementation Study (Feb 2017-June 2018)
ELIXIR Webinar
11 April 2018
Develop exemplary proteomics data analysis workflows and deploy
them in the EMBL-EBI "Embassy Cloud”:
(1) Standard identification workflow
(2) Identification workflow for PTMs
(3) Quantification (label-free/label-based approaches)
(4) Quality Control (to aid data set interpretation/reanalysis
evaluation)
(5) Versions of quantification approaches (including PTMs)
è Connected to public proteomics data from
ELIXIR Implementation Study
ELIXIR Webinar
11 April 2018
Consolidating data access and provision of robust
analysis pipelines
• Development of free-to-use, scalable, and user-
friendly data analysis pipelines including cloud
deployment
• Cloud-based data analysis pipelines’ appeal
1. Increasingly large datasets
2. Local struggle with the ‘compute task’
ELIXIR Implementation Study
ELIXIR Webinar
11 April 2018
Cloud workflow in genomics:
Simplified workflow launcher
• One workflow
• For AWS
• Enabling co-analysis with the larger PanCancer dataset
“Running a >30x whole genome alignment is [...] roughly 4 days
and ~$10 on a single m4.2 xlarge instance.”*
*: http://icgc.org/working-pancancer-data-aws
Existing clouds for genomics… one example
ELIXIR Webinar
11 April 2018
Infrastructure as a Service:
• Compute power not necessarily local but remote
• Still from compute centres, but on a larger scale
The ‘service’ is:
• Customer gets infrastructure, but it’s virtualized
• This Abstraction yields better
utilisation and scalability (but...)
• Developer/Customer has to interface with these abstraction layers
What’s a cloud environment?
ELIXIR Webinar
11 April 2018
Elixir proteomics use-case (soon proteomics community)
PROTEOMES
(Proteoform centric,
including PTMs and
sequence variants)
AREA 1: Reproducible
open analysis
pipelines: DDA, DIA,
targeted proteomics,
and others
DATA
PRODUCERS
PROTEOMICS
DATA ANALYSIS &
QC
ELIXIR Webinar
11 April 2018
Elixir proteomics use-case (soon proteomics community)
PROTEOMES
(Proteoform centric,
including PTMs and
sequence variants)
AREA 1: Reproducible
open analysis
pipelines: DDA, DIA,
targeted proteomics,
and others
DATA
PRODUCERS
PROTEOMICS
DATA ANALYSIS &
QC
Proteogenomics and
Proteotranscriptomics
AREA 2:
Multi-omics
integration
Proteometabol-
omics
SYSTEMS
BIOLOGY &
SYSTEMS
MEDICINE
ELIXIR Webinar
11 April 2018
Elixir proteomics use-case (soon proteomics community)
PROTEOMES
(Proteoform centric,
including PTMs and
sequence variants)
AREA 1: Reproducible
open analysis
pipelines: DDA, DIA,
targeted proteomics,
and others
DATA
PRODUCERS
PROTEOMICS
DATA ANALYSIS &
QC
UniProt
neXtProt
Protein
Knowledge
Bases
LIMS
Others
PRIDE
Proteogenomics and
Proteotranscriptomics
AREA 2:
Multi-omics
integration
Proteometabol-
omics
SYSTEMS
BIOLOGY &
SYSTEMS
MEDICINE
ELIXIR Webinar
11 April 2018
Thanks to
Analysis pipeline construction
ELIXIR Webinar
11 April 2018
We opted for the framework
Features:
• Tool modularisation
• Solutions for data handover between tools with standardised
(PSI) formats
• Adapters for integrating third-party software (Search Engines,
LuciPHOr, FIDO, percolator, etc.)
• Integration into various workflow systems as a basis
Analysis pipeline construction
ELIXIR Webinar
11 April 2018
Analysis pipeline construction
Kubernetes & container advantages
• Software in containers
Ø readily usable and well isolated modules
• Resilient system, working in different infrastructure
environments
ELIXIR Webinar
11 April 2018
Summarising the benefits of a cloud based pipeline
• The containerisation of workflow steps makes execution
resource efficient and version aware
• Compute infrastructure can be added dynamically,
infrastructure is setup on-demand (and released after use)
• Bring the analysis to the data
ELIXIR Webinar
11 April 2018
Workflow design and parameter selection
29
ELIXIR Webinar
11 April 2018
• PRIDE data connection into the cloud is being optimised
• The workflows are deployed into the EMBL-EBI
“Embassy Cloud” Portal and fitted with a dashboard as a
proof of concept.
• Conceptually, these workflows can be deployed in
different cloud infrastructures in the future so they can be
used openly by the wider community.
Current status
ELIXIR Webinar
11 April 2018
• One slide intro to proteomics
• The ELIXIR Proteomics Community
• The implementation study
• Plans for the near future
Outline
ELIXIR Webinar
11 April 2018
• Follow-up of the implementation study just mentioned.
• Title: "Extending open proteomics data analysis pipelines in the
cloud: Additional tools and focus on scalability, supporting the
dramatic growth of public proteomics data"
• It will start on August 2018 (1 year):
• Led by ELIXIR-Belgium (Martens).
• Participation of EMBL-EBI (Vizcaíno, Newhouse), ELIXIR-
Germany (Kohlbacher), ELIXIR-France (Bouyssie), ELIXIR-
Spain (Sabidó)
• It will include other tools and additional pipelines (Compomics tools,
QCloud, PROFI tools, etc).
Just approved Implementation Study (2018-2019)
ELIXIR Webinar
11 April 2018
• Assigned to the Community (10 ELIXIR nodes involved). It will start
on June 2018 (1 year).
• Title: ”Crowd-sourcing the annotation of public proteomics
datasets to improve data reusability”.
• Apply software developed in the different nodes to improve
automatic annotation pipelines linked to PRIDE (and QC
assessment).
• Improve re-usability of public data.
Just approved Implementation Study (2018-2019)
ELIXIR Webinar
11 April 2018
PROTEOMES
(Proteoform centric,
including PTMs and
sequence variants)
UniProt
neXtProt
Protein
Knowledge
Bases
Proteogenomics and
Proteotranscriptomics
AREA 3: Data
management
& Annotation
Metadata
improvements;
management of human
identifiable data; data
standards (e.g. for
multi-omics
approaches)
AREA 1: Reproducible
open analysis
pipelines: DDA, DIA,
targeted proteomics,
and others
AREA 2:
Multi-omics
integration
Proteometabol-
omics
DATA
PRODUCERS
LIMS
SYSTEMS
BIOLOGY &
SYSTEMS
MEDICINE
PROTEOMICS
DATA ANALYSIS &
QC
Others
DATA
MANAGEMENT
PRIDE
ELIXIR Webinar
11 April 2018
• Proteomics bioinformatics activities in Europe are
very prominent world-wide
• Analysis infrastructure: work in progress
• Plans for the future:
• Data integration approaches with other ‘omics’
technologies (e.g. genomics, metabolomics, etc).
• Add pipelines for other popular experimental techniques
• Improve data management practises (metadata
annotation, management of clinical data, …)
Summary
ELIXIR Webinar
11 April 2018
Acknowledgements
Thank you!
Proteomics team
Yasset Perez-Riverol
Andrew Jarnuczak
Tobias Ternent
Phenomenal
Pablo Moreno
Embassy cloud
David Ocaña
ELIXIR-DE
The OpenMS team:
Oliver Kohlbacher
Timo Sachsenberg
Julianus Pfeuffer
Martin Eisenacher
MDC
Chris Bielow
Sanger/ ICR
Jyoti Choudhary
Hendrik Weisser
Embassy cloud portal
Jose A. Dianes
ELIXIR Webinar
11 April 2018

Mais conteúdo relacionado

Mais procurados

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 

Mais procurados (20)

Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Hans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital PreservationHans Hofman - European Perspectives on Digital Preservation
Hans Hofman - European Perspectives on Digital Preservation
 
The University of Edinburgh Research Data Management Service Suite
The University of Edinburgh Research Data Management Service SuiteThe University of Edinburgh Research Data Management Service Suite
The University of Edinburgh Research Data Management Service Suite
 
MIDESS
MIDESSMIDESS
MIDESS
 
170131 tryggve-at ssi-biobanks-ap
170131 tryggve-at ssi-biobanks-ap170131 tryggve-at ssi-biobanks-ap
170131 tryggve-at ssi-biobanks-ap
 
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
Ontologies for Crisis Management: A Review of State of the Art in Ontology De...
 
Nordic Tryggve project
Nordic Tryggve projectNordic Tryggve project
Nordic Tryggve project
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big data
 
Yjs: A Framework for Near Real-time P2P Shared Editing on Arbitrary Data Types
Yjs: A Framework for Near Real-time P2P Shared Editing on Arbitrary Data TypesYjs: A Framework for Near Real-time P2P Shared Editing on Arbitrary Data Types
Yjs: A Framework for Near Real-time P2P Shared Editing on Arbitrary Data Types
 
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
EOSC in practice - Silvana Muscella (chair EOSC HLEG)EOSC in practice - Silvana Muscella (chair EOSC HLEG)
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
Introducing SURF
Introducing SURF Introducing SURF
Introducing SURF
 
Spatineo Webinar: Shedding Light on INSPIRE Conformity
Spatineo Webinar: Shedding Light on INSPIRE ConformitySpatineo Webinar: Shedding Light on INSPIRE Conformity
Spatineo Webinar: Shedding Light on INSPIRE Conformity
 
A Methodology and Tool Support for Widget-based Web Application Development
A Methodology and Tool Support for Widget-based Web Application DevelopmentA Methodology and Tool Support for Widget-based Web Application Development
A Methodology and Tool Support for Widget-based Web Application Development
 
The IGeLU Linked Open Data Special Interest Working Group
The IGeLU Linked Open Data Special Interest Working GroupThe IGeLU Linked Open Data Special Interest Working Group
The IGeLU Linked Open Data Special Interest Working Group
 
ESDIN - OGC Web Services Shibboleth Interoperability Experiment (OSI)
ESDIN - OGC Web Services Shibboleth Interoperability Experiment (OSI)ESDIN - OGC Web Services Shibboleth Interoperability Experiment (OSI)
ESDIN - OGC Web Services Shibboleth Interoperability Experiment (OSI)
 

Semelhante a ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Processing and Analysis of Large-Scale Proteomics Data”

The Ascent of Open Science and the European Open Science Cloud
The Ascent of Open Science and the European Open Science CloudThe Ascent of Open Science and the European Open Science Cloud
The Ascent of Open Science and the European Open Science Cloud
Tiziana Ferrari
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 

Semelhante a ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Processing and Analysis of Large-Scale Proteomics Data” (20)

ELIXIR
ELIXIRELIXIR
ELIXIR
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
European Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEuropean Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunities
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Globus in European Life Science
Globus in European Life ScienceGlobus in European Life Science
Globus in European Life Science
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
The Ascent of Open Science and the European Open Science Cloud
The Ascent of Open Science and the European Open Science CloudThe Ascent of Open Science and the European Open Science Cloud
The Ascent of Open Science and the European Open Science Cloud
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
ELIXIR-UK
ELIXIR-UKELIXIR-UK
ELIXIR-UK
 
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 

Mais de Juan Antonio Vizcaino

Mais de Juan Antonio Vizcaino (20)

Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...Reusing and integrating public proteomics data to improve our knowledge of th...
Reusing and integrating public proteomics data to improve our knowledge of th...
 
Introduction to the PSI standard data formats
Introduction to the PSI standard data formatsIntroduction to the PSI standard data formats
Introduction to the PSI standard data formats
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
PRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchangePRIDE resources and ProteomeXchange
PRIDE resources and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018Introduction to the Proteomics Bioinformatics Course 2018
Introduction to the Proteomics Bioinformatics Course 2018
 
PSI-Proteome Informatics update
PSI-Proteome Informatics updatePSI-Proteome Informatics update
PSI-Proteome Informatics update
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
The ELIXIR Proteomics community
The ELIXIR Proteomics community The ELIXIR Proteomics community
The ELIXIR Proteomics community
 
The ELIXIR Proteomics Community
The ELIXIR Proteomics CommunityThe ELIXIR Proteomics Community
The ELIXIR Proteomics Community
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...A proteomics data “gold mine” at your disposal: Now that the data is there, w...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
 
The ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 updateThe ProteomeXchange Consoritum: 2017 update
The ProteomeXchange Consoritum: 2017 update
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...Public proteomics data: a (mostly unexploited) gold mine for computational re...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
 
How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?How to run and maintain a popular biological data repository?
How to run and maintain a popular biological data repository?
 
Reuse of public proteomics data
Reuse of public proteomics dataReuse of public proteomics data
Reuse of public proteomics data
 
PRIDE and ProteomeXchange
PRIDE and ProteomeXchangePRIDE and ProteomeXchange
PRIDE and ProteomeXchange
 
Proteomics repositories
Proteomics repositoriesProteomics repositories
Proteomics repositories
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017Introduction to the Proteomics Bioinformatics Course 2017
Introduction to the Proteomics Bioinformatics Course 2017
 

Último

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

Último (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Processing and Analysis of Large-Scale Proteomics Data”

  • 1. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Processing and Analysis of Large-Scale Proteomics Data” Juan AntonioVizcaíno Mathias Walzer European Bioinformatics Institute (EMBL-EBI) juan@ebi.ac.uk, walzer@ebi.ac.uk
  • 2. ELIXIR Webinar 11 April 2018 • One slide intro to proteomics • The ELIXIR Proteomics Community • The implementation study • Plans for the near future Outline
  • 3. ELIXIR Webinar 11 April 2018 One slide intro to Mass Spectrometry proteomics Hein et al., Handbook of Systems Biology, 2012 Proteins ≈ most drug targets
  • 4. ELIXIR Webinar 11 April 2018 • One slide intro to proteomics • The ELIXIR Proteomics Community • The implementation study • Plans for the near future Outline
  • 5. ELIXIR Webinar 11 April 2018 • The goal of the ELIXIR proteomics community is to develop and maintain sustainable proteomics tools and data resources • An essential part of the development will also be the ‘FAIRification’ of the resources (i.e. making the resources FAIR) • Integrate proteomics bioinformatics activities in ELIXIR Overall objectives
  • 6. ELIXIR Webinar 11 April 2018 • 11 ELIXIR nodes supported the application: • Germany (co-lead) (O. Kohlbacher) • Belgium (co-lead) (L. Martens) • Czech Republic • Denmark • Ireland • France • Netherlands • Spain • Sweden • United Kingdom • EMBL-EBI (co-lead) (Juan A. Vizcaíno) ELIXIR nodes supporting the new Community
  • 7. ELIXIR Webinar 11 April 2018 White paper as the basis for this Community Vizcaíno et al., F1000Research, 2017
  • 8. ELIXIR Webinar 11 April 2018 Highlighting already existing resources and initiatives Tools: Services and connectors to drive access and exploitation Data: Sustaining Europe’s life science data infrastructure Interoperability: Integration of data and services Compute: Access, exchange and storage Training: Professional skills for managing and exploiting data
  • 9. ELIXIR Webinar 11 April 2018 Tools: Services and connectors to drive access and exploitation Data: Sustaining Europe’s life science data infrastructure Interoperability: Integration of data and services Compute: Access, exchange and storage Training: Professional skills for managing and exploiting data Highlighting already existing resources and initiatives
  • 10. ELIXIR Webinar 11 April 2018 • PRIDE stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches • Any type of data can be stored • Leading ProteomeXchange • From July 2017, an ELIXIR core resource European leadership: the world-leading PRIDE database http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 11. ELIXIR Webinar 11 April 2018 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory data deposition http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014 Deutsch et al., NAR, 2017 iProX (MS/MS data) • Framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
  • 12. ELIXIR Webinar 11 April 2018 PRIDE data submissions and data growth > 2,400 datasets submitted in 2017 September, November and December 2017 were the record months in terms of submitted datasets Datasets submitted per month Datasets submitted per year
  • 13. ELIXIR Webinar 11 April 2018 Stats: Data growth in EMBL-EBI resources Sequence data Micro-array Metabolomics Proteomics
  • 14. ELIXIR Webinar 11 April 2018 Data re-use in proteomics is increasing Data download volume for PRIDE Archive in 2017: 295 TB 0 50 100 150 200 250 300 350 2013 2014 2015 2016 2017 Downloads in TBs
  • 15. ELIXIR Webinar 11 April 2018 • One slide intro to proteomics • The ELIXIR Proteomics Community • The implementation study • Plans for the near future Outline
  • 16. ELIXIR Webinar 11 April 2018 Tools: Services and connectors to drive access and exploitation Data: Sustaining Europe’s life science data infrastructure Interoperability: Integration of data and services Compute: Access, exchange and storage Training: Professional skills for managing and exploiting data ELIXIR Platforms
  • 17. ELIXIR Webinar 11 April 2018 • Title: ‘’Mining the proteome: Enabling automated processing and analysis of large-scale proteomics data”. • Development of open, reproducible, and robust analysis pipelines for DDA (Data Dependent Acquisition) approaches. • Deployment in the EMBL-”Embassy Cloud” (and optionally later other clouds) • Connected to PRIDE, bringing analysis closer to the data. • Who is involved? • EMBL-EBI (Vizcaíno & Newhouse). • ELIXIR-DE (Kohlbacher, EKUT, Eisenacher, RUB) ELIXIR Implementation Study (Feb 2017-June 2018)
  • 18. ELIXIR Webinar 11 April 2018 Develop exemplary proteomics data analysis workflows and deploy them in the EMBL-EBI "Embassy Cloud”: (1) Standard identification workflow (2) Identification workflow for PTMs (3) Quantification (label-free/label-based approaches) (4) Quality Control (to aid data set interpretation/reanalysis evaluation) (5) Versions of quantification approaches (including PTMs) è Connected to public proteomics data from ELIXIR Implementation Study
  • 19. ELIXIR Webinar 11 April 2018 Consolidating data access and provision of robust analysis pipelines • Development of free-to-use, scalable, and user- friendly data analysis pipelines including cloud deployment • Cloud-based data analysis pipelines’ appeal 1. Increasingly large datasets 2. Local struggle with the ‘compute task’ ELIXIR Implementation Study
  • 20. ELIXIR Webinar 11 April 2018 Cloud workflow in genomics: Simplified workflow launcher • One workflow • For AWS • Enabling co-analysis with the larger PanCancer dataset “Running a >30x whole genome alignment is [...] roughly 4 days and ~$10 on a single m4.2 xlarge instance.”* *: http://icgc.org/working-pancancer-data-aws Existing clouds for genomics… one example
  • 21. ELIXIR Webinar 11 April 2018 Infrastructure as a Service: • Compute power not necessarily local but remote • Still from compute centres, but on a larger scale The ‘service’ is: • Customer gets infrastructure, but it’s virtualized • This Abstraction yields better utilisation and scalability (but...) • Developer/Customer has to interface with these abstraction layers What’s a cloud environment?
  • 22. ELIXIR Webinar 11 April 2018 Elixir proteomics use-case (soon proteomics community) PROTEOMES (Proteoform centric, including PTMs and sequence variants) AREA 1: Reproducible open analysis pipelines: DDA, DIA, targeted proteomics, and others DATA PRODUCERS PROTEOMICS DATA ANALYSIS & QC
  • 23. ELIXIR Webinar 11 April 2018 Elixir proteomics use-case (soon proteomics community) PROTEOMES (Proteoform centric, including PTMs and sequence variants) AREA 1: Reproducible open analysis pipelines: DDA, DIA, targeted proteomics, and others DATA PRODUCERS PROTEOMICS DATA ANALYSIS & QC Proteogenomics and Proteotranscriptomics AREA 2: Multi-omics integration Proteometabol- omics SYSTEMS BIOLOGY & SYSTEMS MEDICINE
  • 24. ELIXIR Webinar 11 April 2018 Elixir proteomics use-case (soon proteomics community) PROTEOMES (Proteoform centric, including PTMs and sequence variants) AREA 1: Reproducible open analysis pipelines: DDA, DIA, targeted proteomics, and others DATA PRODUCERS PROTEOMICS DATA ANALYSIS & QC UniProt neXtProt Protein Knowledge Bases LIMS Others PRIDE Proteogenomics and Proteotranscriptomics AREA 2: Multi-omics integration Proteometabol- omics SYSTEMS BIOLOGY & SYSTEMS MEDICINE
  • 25. ELIXIR Webinar 11 April 2018 Thanks to Analysis pipeline construction
  • 26. ELIXIR Webinar 11 April 2018 We opted for the framework Features: • Tool modularisation • Solutions for data handover between tools with standardised (PSI) formats • Adapters for integrating third-party software (Search Engines, LuciPHOr, FIDO, percolator, etc.) • Integration into various workflow systems as a basis Analysis pipeline construction
  • 27. ELIXIR Webinar 11 April 2018 Analysis pipeline construction Kubernetes & container advantages • Software in containers Ø readily usable and well isolated modules • Resilient system, working in different infrastructure environments
  • 28. ELIXIR Webinar 11 April 2018 Summarising the benefits of a cloud based pipeline • The containerisation of workflow steps makes execution resource efficient and version aware • Compute infrastructure can be added dynamically, infrastructure is setup on-demand (and released after use) • Bring the analysis to the data
  • 29. ELIXIR Webinar 11 April 2018 Workflow design and parameter selection 29
  • 30. ELIXIR Webinar 11 April 2018 • PRIDE data connection into the cloud is being optimised • The workflows are deployed into the EMBL-EBI “Embassy Cloud” Portal and fitted with a dashboard as a proof of concept. • Conceptually, these workflows can be deployed in different cloud infrastructures in the future so they can be used openly by the wider community. Current status
  • 31. ELIXIR Webinar 11 April 2018 • One slide intro to proteomics • The ELIXIR Proteomics Community • The implementation study • Plans for the near future Outline
  • 32. ELIXIR Webinar 11 April 2018 • Follow-up of the implementation study just mentioned. • Title: "Extending open proteomics data analysis pipelines in the cloud: Additional tools and focus on scalability, supporting the dramatic growth of public proteomics data" • It will start on August 2018 (1 year): • Led by ELIXIR-Belgium (Martens). • Participation of EMBL-EBI (Vizcaíno, Newhouse), ELIXIR- Germany (Kohlbacher), ELIXIR-France (Bouyssie), ELIXIR- Spain (Sabidó) • It will include other tools and additional pipelines (Compomics tools, QCloud, PROFI tools, etc). Just approved Implementation Study (2018-2019)
  • 33. ELIXIR Webinar 11 April 2018 • Assigned to the Community (10 ELIXIR nodes involved). It will start on June 2018 (1 year). • Title: ”Crowd-sourcing the annotation of public proteomics datasets to improve data reusability”. • Apply software developed in the different nodes to improve automatic annotation pipelines linked to PRIDE (and QC assessment). • Improve re-usability of public data. Just approved Implementation Study (2018-2019)
  • 34. ELIXIR Webinar 11 April 2018 PROTEOMES (Proteoform centric, including PTMs and sequence variants) UniProt neXtProt Protein Knowledge Bases Proteogenomics and Proteotranscriptomics AREA 3: Data management & Annotation Metadata improvements; management of human identifiable data; data standards (e.g. for multi-omics approaches) AREA 1: Reproducible open analysis pipelines: DDA, DIA, targeted proteomics, and others AREA 2: Multi-omics integration Proteometabol- omics DATA PRODUCERS LIMS SYSTEMS BIOLOGY & SYSTEMS MEDICINE PROTEOMICS DATA ANALYSIS & QC Others DATA MANAGEMENT PRIDE
  • 35. ELIXIR Webinar 11 April 2018 • Proteomics bioinformatics activities in Europe are very prominent world-wide • Analysis infrastructure: work in progress • Plans for the future: • Data integration approaches with other ‘omics’ technologies (e.g. genomics, metabolomics, etc). • Add pipelines for other popular experimental techniques • Improve data management practises (metadata annotation, management of clinical data, …) Summary
  • 36. ELIXIR Webinar 11 April 2018 Acknowledgements Thank you! Proteomics team Yasset Perez-Riverol Andrew Jarnuczak Tobias Ternent Phenomenal Pablo Moreno Embassy cloud David Ocaña ELIXIR-DE The OpenMS team: Oliver Kohlbacher Timo Sachsenberg Julianus Pfeuffer Martin Eisenacher MDC Chris Bielow Sanger/ ICR Jyoti Choudhary Hendrik Weisser Embassy cloud portal Jose A. Dianes