SlideShare uma empresa Scribd logo
1 de 28
Three challenges for metabolomics study databases Kees van Bochove   June 2011Metabolomics Society Meeting
Metabolomics database If you search for ‘metabolomics database’ you get 400K+ results, most of them recent By far the most of these databases are compound-centric, few have real study data Of the metabolomics study databases, most are GC-MS, many NMR, almost no LC-MS
Outline Storage of study metadata How to represent the biological context of samples in the database Representation of data preprocessing, identification and quantification How to represent the assumed identities for targeted and untargeted analyses How to represent quantification and internal standard samples Connection to other ‘omics’ data ,[object Object],[object Object]
Data Support Platform: website
DSP: Open Source strategy We are not the only consortium storing data!  Reach sustainability by working together with active open source projects like dbNP and Galaxy Everyone can start their own database using the same open source technology, in fact we use this strategy internally
Challenge 1: Study metadata Without proper and comprehensive description of the biological context of the sample, a metabolomics results database is useless Especially for mammalian studies, study designs are often complex, involving multiple factors, timepoints, samples etc. NMC strategy: partner with database initiatives from neighbor projects: NuGO (nutrigenomics), NBIC (bioinformatics), NTC (toxicogenomics) etc.: dbNP initiative http://dbnp.org
Data levels in DSP Lineastudy, code 06-E6P, inclusion criteria.. Femalehuman, 46 yearsold, BMI 26.4 5ml blood was taken at 4w after start study Blood sample Metabolomics LC-MS lipidomicsassay { LPC17:0: RT 1,416 Area 5469406 , … }
Studywizardallowsfor complex designs
Example of a studytimeline
Allowforflexiblestudydescription: ‘templates’ for metadata fields
Excel importer
Challenge 2: representation of metabolomics data Preprocessing Identification Quantification
How to implement preprocessing? We chose not to in the end Supplied mzMatch pipeline in earlier stage, but preprocessing is often too intertwined with measurement SOP Move from vendor specific software to general frameworks like XCMS, mzMatch, mzMine etc. would be beneficial for comparability of data, but in practice requires a lot of effort/tuning
How to implement metabolite identity? Consensus at standardization workshops: InChI key to identify structure Not always clear which structure(s) a peak represents, and with untargeted metabolomics we might have no clue So we store ‘features’, which are specific to measurement SOP and preprocessing SOP, and link those to metabolite identity records
How to implement quantification? At the moment, we store only peak area or intensity, and any Internal Standard and Quality Control sample data is stored along with the biological sample data We expect that preprocessing / quality control is done before data import Working now on adding more levels of quantification, i.e. concentration
Imported metabolomics dataset
And again – Excel import!
Challenge 3: embedding of data Metabolomics is often not the only performed analysis on samples Important to cross-linked to other environmental and genetic data Thanks to our partners, NuGO, NBIC etc. there are also modules for next generation sequencing, transcriptomics, and clinical chemistry data All this data is cross-queryable
Transcriptomics module
NextGenerationSequencing module
Query composer
Query resultson sample level
Next focus We have several tools developed within NMC, such as spectral tree analysis tool Reach sustainability by merging those tools in one analytical platform Use existing bioinformatics open source project: Galaxy Re-use existing projects from collaborators: MetaboAnalyst from Human Metabolome Project, Alberta, Canada – David Wishart
Galaxy (toolbox / visualization)
Distributeddeployment of NMC DSP Study owners host study metadata at own institution Metabolomics labs host metabolomics modules Data access is governed by study owners TNO studies DSM studies TNO clinical chemistry PRI studies Shared processing & evaluation toolbox WUR transcriptomics DCL metabolomics PRI metabolomics etc...
Conclusion Many compound databases, few databases with actual study data Very hard to represent LC-MS measurements in a meaningful way Storing study design and sample metadata is key to analysis Many benefits of open collaboration, as opposed to closed-source in-house solutions Test it: http://test.nmcdsp.org login withusername ‘nmc’ and password ‘noordwijkerhout’ Suggestions/remarks to kees@thehyve.nl
Acknowledgements TjeerdAbma Adem Bilican JildauBouwman Christine Chichester Sudeshna Das Marjan van Erk Chris Evelo PrasadGajula Roeland van Ham Thomas Hankemeier Margriet Hendriks Guido Hooiveld Robert Horlings Peter Horvatovich Rob Hooft Machiel Jansen Jim Kaput KostasKarasavvas Bart Keijser Matthew Lange ScottMarshall Barend Mons Ben van Ommen LinettePellis Janneke van der Ploeg MarijanaRadonjic Theo Reijmers Erik Roos Marco Roos Frans Paul Ruzius JahnSaito SusannaSansone SiemenSikkema Rob Stierum Eugene van Someren Morris Swertz Chris Taylor Michael van Vliet Jeroen Wesbeek KatyWolstencroft Suzan Wopereis Gooitzen Zwanenburg
Metabolomics Society meeting 2011 - presentatie Kees

Mais conteúdo relacionado

Mais procurados

NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno Schwikowski
Alexander Pico
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
Benjamin Good
 

Mais procurados (20)

Technology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential NetworksTechnology R&D Theme 1: Differential Networks
Technology R&D Theme 1: Differential Networks
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
NRNB Annual Report 2017
NRNB Annual Report 2017NRNB Annual Report 2017
NRNB Annual Report 2017
 
NRNB Annual Report 2016: Overall
NRNB Annual Report 2016: OverallNRNB Annual Report 2016: Overall
NRNB Annual Report 2016: Overall
 
Next Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway StudioNext Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway Studio
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
Technology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive NetworksTechnology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive Networks
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 
NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno Schwikowski
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Automatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networksAutomatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networks
 
NRNB EAC Report 2011
NRNB EAC Report 2011NRNB EAC Report 2011
NRNB EAC Report 2011
 
openSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association StudiesopenSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association Studies
 
Data base searching tool
Data base searching toolData base searching tool
Data base searching tool
 

Destaque

Pharmaceutical labelling
Pharmaceutical labellingPharmaceutical labelling
Pharmaceutical labelling
Kiran Hameed
 

Destaque (13)

Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Metabolomics Data Analysis
Metabolomics Data AnalysisMetabolomics Data Analysis
Metabolomics Data Analysis
 
Proteomics
Proteomics Proteomics
Proteomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Genomic and proteomic screen
Genomic and proteomic screenGenomic and proteomic screen
Genomic and proteomic screen
 
Juan carlos ppt als
Juan carlos ppt alsJuan carlos ppt als
Juan carlos ppt als
 
Pharmaceutical labelling
Pharmaceutical labellingPharmaceutical labelling
Pharmaceutical labelling
 
Proteomics ppt
Proteomics pptProteomics ppt
Proteomics ppt
 

Semelhante a Metabolomics Society meeting 2011 - presentatie Kees

On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
Robert Oostenveld
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
Neil Swainston
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Catherine Canevet
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
Chien-Wei Lin
 

Semelhante a Metabolomics Society meeting 2011 - presentatie Kees (20)

On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Cv long
Cv longCv long
Cv long
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Metabolomics Society meeting 2011 - presentatie Kees

  • 1. Three challenges for metabolomics study databases Kees van Bochove June 2011Metabolomics Society Meeting
  • 2. Metabolomics database If you search for ‘metabolomics database’ you get 400K+ results, most of them recent By far the most of these databases are compound-centric, few have real study data Of the metabolomics study databases, most are GC-MS, many NMR, almost no LC-MS
  • 3.
  • 5. DSP: Open Source strategy We are not the only consortium storing data!  Reach sustainability by working together with active open source projects like dbNP and Galaxy Everyone can start their own database using the same open source technology, in fact we use this strategy internally
  • 6. Challenge 1: Study metadata Without proper and comprehensive description of the biological context of the sample, a metabolomics results database is useless Especially for mammalian studies, study designs are often complex, involving multiple factors, timepoints, samples etc. NMC strategy: partner with database initiatives from neighbor projects: NuGO (nutrigenomics), NBIC (bioinformatics), NTC (toxicogenomics) etc.: dbNP initiative http://dbnp.org
  • 7. Data levels in DSP Lineastudy, code 06-E6P, inclusion criteria.. Femalehuman, 46 yearsold, BMI 26.4 5ml blood was taken at 4w after start study Blood sample Metabolomics LC-MS lipidomicsassay { LPC17:0: RT 1,416 Area 5469406 , … }
  • 9. Example of a studytimeline
  • 12. Challenge 2: representation of metabolomics data Preprocessing Identification Quantification
  • 13. How to implement preprocessing? We chose not to in the end Supplied mzMatch pipeline in earlier stage, but preprocessing is often too intertwined with measurement SOP Move from vendor specific software to general frameworks like XCMS, mzMatch, mzMine etc. would be beneficial for comparability of data, but in practice requires a lot of effort/tuning
  • 14. How to implement metabolite identity? Consensus at standardization workshops: InChI key to identify structure Not always clear which structure(s) a peak represents, and with untargeted metabolomics we might have no clue So we store ‘features’, which are specific to measurement SOP and preprocessing SOP, and link those to metabolite identity records
  • 15. How to implement quantification? At the moment, we store only peak area or intensity, and any Internal Standard and Quality Control sample data is stored along with the biological sample data We expect that preprocessing / quality control is done before data import Working now on adding more levels of quantification, i.e. concentration
  • 17. And again – Excel import!
  • 18. Challenge 3: embedding of data Metabolomics is often not the only performed analysis on samples Important to cross-linked to other environmental and genetic data Thanks to our partners, NuGO, NBIC etc. there are also modules for next generation sequencing, transcriptomics, and clinical chemistry data All this data is cross-queryable
  • 23. Next focus We have several tools developed within NMC, such as spectral tree analysis tool Reach sustainability by merging those tools in one analytical platform Use existing bioinformatics open source project: Galaxy Re-use existing projects from collaborators: MetaboAnalyst from Human Metabolome Project, Alberta, Canada – David Wishart
  • 24. Galaxy (toolbox / visualization)
  • 25. Distributeddeployment of NMC DSP Study owners host study metadata at own institution Metabolomics labs host metabolomics modules Data access is governed by study owners TNO studies DSM studies TNO clinical chemistry PRI studies Shared processing & evaluation toolbox WUR transcriptomics DCL metabolomics PRI metabolomics etc...
  • 26. Conclusion Many compound databases, few databases with actual study data Very hard to represent LC-MS measurements in a meaningful way Storing study design and sample metadata is key to analysis Many benefits of open collaboration, as opposed to closed-source in-house solutions Test it: http://test.nmcdsp.org login withusername ‘nmc’ and password ‘noordwijkerhout’ Suggestions/remarks to kees@thehyve.nl
  • 27. Acknowledgements TjeerdAbma Adem Bilican JildauBouwman Christine Chichester Sudeshna Das Marjan van Erk Chris Evelo PrasadGajula Roeland van Ham Thomas Hankemeier Margriet Hendriks Guido Hooiveld Robert Horlings Peter Horvatovich Rob Hooft Machiel Jansen Jim Kaput KostasKarasavvas Bart Keijser Matthew Lange ScottMarshall Barend Mons Ben van Ommen LinettePellis Janneke van der Ploeg MarijanaRadonjic Theo Reijmers Erik Roos Marco Roos Frans Paul Ruzius JahnSaito SusannaSansone SiemenSikkema Rob Stierum Eugene van Someren Morris Swertz Chris Taylor Michael van Vliet Jeroen Wesbeek KatyWolstencroft Suzan Wopereis Gooitzen Zwanenburg