SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Case Study
Life Sciences Data
Centre for Integrative Systems
Biology and Bioinformatics
www.imperial.ac.uk/bioinfsupport
Sarah Butcher
s.butcher@imperial.ac.uk
www.imperial.ac.uk/bioinfsupport
Bio-data Characteristics
 Lack of structure, rapid growth but not huge volume,
high heterogeneity
 Multiple file formats, widely differing sizes, acquisition rates
 Considerable manual data collection
 Multiple format changes over data lifetime including
production of (evolving) exchange formats
 Huge range of analysis methods, algorithms and
software in use with wide ranging computational profiles
 Association with multiple metadata standards and
ontologies, some of which are still evolving
 Increasing reference or link to patient data with associated
security requirements
So What Are These Data Anyway?
 Raw data files (sometimes)
 Analysed data files (generally)
 Results (multiple formats, often quantitative)
 Mathematics Models (sometimes)
 New hypotheses (hard to encapsulate without context)
 Standard operating procedures (occasionally)
 Software, tools and interfaces (sometimes)
 ‘dark data’ – miscellaneous additional or interim
datasets that aren’t directly tied to a publication – might
be re-useable if shared and suitable quality
It’s Complicated ….
Primary database – DNA or protein sequence
Secondary - (derived information e.g. protein domains)
Protein structure or other (e.g. crystal coordinates)
Funders
and…. and….
Many Data Areas
Genome sequencing
imaging
RNA profiles
Protein profiles
Metabolic profiles
Protein interaction
studiesLarge-scale field studies
Improved understanding
of complex biological system
GWAS
Challenges in primary analyses (smaller)
AND in meaningful integration (huge)
Data Formats
Even for one experimental type, many file formats
may be human readable, require require specific software,
proprietary or open source….. and Excel spreadsheets
Data Stages - RNA-Seq Experiment
Total Volume
experiment
24 TB
240 GB
32 GB
260 MB
Data Lifecycle Management
 Grant-writing stage - Data sharing/management Plan
DMPOnline https://dmponline.dcc.ac.uk/
 Pre-publication - data collection, analysis, storage,
some data sharing – data type-specific repositories, project-
based web-sites, metadata annotation, SOP generation
 Publication - publications, submission of data to public
repositories, project-based sharing – journal supplementary
materials, public data-repositories, project –specific web-sites,
institutional research data catalogs
 Post-publication – updates to active data, data
cleansing, archiving of significant, non-changing data
Data Management/Sharing Plan
 Often the last thing to be written during grant application
 Funding to support plan not always remembered
 Welcome part of grant-planning as encourages focus
 But one size does not fit all
 May not be looked at again until final project phase…..
 Non-generic type-specific EXAMPLES help
 Place for non-traditional ‘data’ often not clear -
software, SOPs, models
 Researchers struggle to provide meaningful detail upfront
– data volumes, formats, which standards, which
repositories – specialist knowledge required
Public Repositories
 NAR online Molecular Biology Database Collection
http://www.oxfordjournals.org/nar/database/c/
currently 1552 databases
 Limited by data domain or origin or both
 One project may require data submission to >1
 May cross-reference data-sets across databases
 Each has its own format and metadata requirements
 Some are manually curated, many are not
 Data submission may be a requirement for journal
publication
Example -
 http://www.ebi.ac.uk/ena/home since 1980
 Genes, genomes (assembled sequences), raw DNA
sequence, annotations
 3 reporting standards of its own, 5 community-based
minimum reporting standards
 Has own XML-based submission system, regularly
extended
 Large datasets can take weeks to prepare/validate
and generate 100’s of thousands of lines of XML, TB
of data
 Stable accession numbers
(Example)
Bio-Data Standards
 30+ minimum reporting guidelines for diverse areas of
biological and biomedical data
 Few cross experimental types – confusion, fragmentation
 Differing levels of use and maturity
 ‘Minimum’ can still be huge – ‘just enough’ movement
 Multiple standard formats for reporting e.g. MAGE-ML
 Not always easy to find associated tools to help use
http://mibbi.sourceforge.net/portal.shtml
ISA-TAB
 Some do cross boundaries
 ISA-TAB framework –
investigation, study, assay
hierarchy
 Acts as a framework for
associating complex data
from a large investigation
 Uptake increasing but still not
very widely used
Xperimentr – Tomlinson et al (2013) BMC Bioinformatics 10.1186/1471-2105-14-8
http://isatab.sourceforge.net/format.html
Specialised Local Repositories
Chernobyl Tissue Bank
IC Tissue Bank
MRIdb
OMERO
Research Outcomes
 Research Councils moving away from grant final report
towards reporting research outputs – including
publications, datasets, collaborations, impact indications
 Tie to funding information – grant codes, project title
 Ideally, continually updated over project lifetime and
beyond
 Bring together stable accession numbers for data stored
in public repositories, access to ‘locally stored dark
data’, publications, summaries, SOPs, software and
tools etc.
 Do need to be findable, stable, searchable,
maintained…..
What to Keep, What to Share ?
 A generic problem area and can be contentious
 Keep all data required to inform a ‘result’
 ‘Worth’ may not be immediately apparent to originator
 Incidental datasets not directly linked to a publication are
still getting lost
 ‘Orphan’ datasets with no standard repository - ditto
 Raw data with full metadata are required for re-
purposing BUT often huge, requires specialist
knowledge, tools to manipulate
 Submission to repositories still requires EFFORT
 Sometimes practical hurdles too great
Challenges
 Integrative science approaches repeatedly show that
complete metadata are vital for optimal data reuse
BUT
 Still a complex time-consuming (often resented) task
 Data fragmentation across multiple sites a major
barrier to uptake (can’t find it… can’t use it…)
 Practical aspects – research plans change, but data
management plans static, cost of storage & curation,
difficulty of funding post-project requirements, sheer
volume of datasets and complexity of data submission
 Intersection with emergent Institutional general policy
Resources
 Ten Simple Rules for the Care and Feeding of Scientific Data.
Goodman et al (2014) PLOS Computational Biology 10 (4) e1003542
 SEEK - http://www.seek4science.org/ (example open source data
repository system)
 www.blugen.org (example project-specific web-site)
 http://www3.imperial.ac.uk/bioinfsupport/projects/systemsbiology/lola
(single project research output example)
 OMERO http://www.openmicroscopy.org/site/products/omero
(example bio-image data management/sharing system)
 MRIdb: Medical Image management for Biobank Research.
Woodbridge et al (2013) J Digit. Imaging 26: 886-890.
 http://www.biomedbridges.eu/news/principles-data-management-and-
sharing-european-research-infrastructures
 Infrastructure Systems Biology Europe
http://project.isbe.eu/preparatory-phase/wps/wp2/
 Elixir http://www.elixir-europe.org
 Biomedbridges http://www.biomedbridges.eu/
 Research Data Alliance https://rd-alliance.org/
 Software Sustainability Institute
http://www.software.ac.uk/
 Biosharing http://biosharing.org/

Mais conteúdo relacionado

Mais procurados

Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Biomedical Resource Ontology
Biomedical Resource OntologyBiomedical Resource Ontology
Biomedical Resource OntologyTrish Whetzel
 
Introduction to DATS v2.2 - NIH May 2017
Introduction to DATS v2.2 - NIH May 2017Introduction to DATS v2.2 - NIH May 2017
Introduction to DATS v2.2 - NIH May 2017Susanna-Assunta Sansone
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentManjulaPatel
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1Bruce Kozuma
 
Improving Integrity, Transparency, and Reproducibility Through Connection of ...
Improving Integrity, Transparency, and Reproducibility Through Connection of ...Improving Integrity, Transparency, and Reproducibility Through Connection of ...
Improving Integrity, Transparency, and Reproducibility Through Connection of ...Andrew Sallans
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSMaaike Duine
 

Mais procurados (20)

Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Biomedical Resource Ontology
Biomedical Resource OntologyBiomedical Resource Ontology
Biomedical Resource Ontology
 
Introduction to DATS v2.2 - NIH May 2017
Introduction to DATS v2.2 - NIH May 2017Introduction to DATS v2.2 - NIH May 2017
Introduction to DATS v2.2 - NIH May 2017
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Improving Integrity, Transparency, and Reproducibility Through Connection of ...
Improving Integrity, Transparency, and Reproducibility Through Connection of ...Improving Integrity, Transparency, and Reproducibility Through Connection of ...
Improving Integrity, Transparency, and Reproducibility Through Connection of ...
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 

Semelhante a Case Study Life Sciences Data: Central for Integrative Systems Biology and Bioinformatics

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)aaroncollie
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management OSTHUS
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 

Semelhante a Case Study Life Sciences Data: Central for Integrative Systems Biology and Bioinformatics (20)

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 

Último

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Último (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bioinformatics

  • 1. Case Study Life Sciences Data Centre for Integrative Systems Biology and Bioinformatics www.imperial.ac.uk/bioinfsupport Sarah Butcher s.butcher@imperial.ac.uk www.imperial.ac.uk/bioinfsupport
  • 2. Bio-data Characteristics  Lack of structure, rapid growth but not huge volume, high heterogeneity  Multiple file formats, widely differing sizes, acquisition rates  Considerable manual data collection  Multiple format changes over data lifetime including production of (evolving) exchange formats  Huge range of analysis methods, algorithms and software in use with wide ranging computational profiles  Association with multiple metadata standards and ontologies, some of which are still evolving  Increasing reference or link to patient data with associated security requirements
  • 3. So What Are These Data Anyway?  Raw data files (sometimes)  Analysed data files (generally)  Results (multiple formats, often quantitative)  Mathematics Models (sometimes)  New hypotheses (hard to encapsulate without context)  Standard operating procedures (occasionally)  Software, tools and interfaces (sometimes)  ‘dark data’ – miscellaneous additional or interim datasets that aren’t directly tied to a publication – might be re-useable if shared and suitable quality
  • 4. It’s Complicated …. Primary database – DNA or protein sequence Secondary - (derived information e.g. protein domains) Protein structure or other (e.g. crystal coordinates)
  • 6. Many Data Areas Genome sequencing imaging RNA profiles Protein profiles Metabolic profiles Protein interaction studiesLarge-scale field studies Improved understanding of complex biological system GWAS Challenges in primary analyses (smaller) AND in meaningful integration (huge)
  • 7. Data Formats Even for one experimental type, many file formats may be human readable, require require specific software, proprietary or open source….. and Excel spreadsheets
  • 8. Data Stages - RNA-Seq Experiment Total Volume experiment 24 TB 240 GB 32 GB 260 MB
  • 9. Data Lifecycle Management  Grant-writing stage - Data sharing/management Plan DMPOnline https://dmponline.dcc.ac.uk/  Pre-publication - data collection, analysis, storage, some data sharing – data type-specific repositories, project- based web-sites, metadata annotation, SOP generation  Publication - publications, submission of data to public repositories, project-based sharing – journal supplementary materials, public data-repositories, project –specific web-sites, institutional research data catalogs  Post-publication – updates to active data, data cleansing, archiving of significant, non-changing data
  • 10. Data Management/Sharing Plan  Often the last thing to be written during grant application  Funding to support plan not always remembered  Welcome part of grant-planning as encourages focus  But one size does not fit all  May not be looked at again until final project phase…..  Non-generic type-specific EXAMPLES help  Place for non-traditional ‘data’ often not clear - software, SOPs, models  Researchers struggle to provide meaningful detail upfront – data volumes, formats, which standards, which repositories – specialist knowledge required
  • 11. Public Repositories  NAR online Molecular Biology Database Collection http://www.oxfordjournals.org/nar/database/c/ currently 1552 databases  Limited by data domain or origin or both  One project may require data submission to >1  May cross-reference data-sets across databases  Each has its own format and metadata requirements  Some are manually curated, many are not  Data submission may be a requirement for journal publication
  • 12. Example -  http://www.ebi.ac.uk/ena/home since 1980  Genes, genomes (assembled sequences), raw DNA sequence, annotations  3 reporting standards of its own, 5 community-based minimum reporting standards  Has own XML-based submission system, regularly extended  Large datasets can take weeks to prepare/validate and generate 100’s of thousands of lines of XML, TB of data  Stable accession numbers (Example)
  • 13. Bio-Data Standards  30+ minimum reporting guidelines for diverse areas of biological and biomedical data  Few cross experimental types – confusion, fragmentation  Differing levels of use and maturity  ‘Minimum’ can still be huge – ‘just enough’ movement  Multiple standard formats for reporting e.g. MAGE-ML  Not always easy to find associated tools to help use http://mibbi.sourceforge.net/portal.shtml
  • 14. ISA-TAB  Some do cross boundaries  ISA-TAB framework – investigation, study, assay hierarchy  Acts as a framework for associating complex data from a large investigation  Uptake increasing but still not very widely used Xperimentr – Tomlinson et al (2013) BMC Bioinformatics 10.1186/1471-2105-14-8 http://isatab.sourceforge.net/format.html
  • 15. Specialised Local Repositories Chernobyl Tissue Bank IC Tissue Bank MRIdb OMERO
  • 16. Research Outcomes  Research Councils moving away from grant final report towards reporting research outputs – including publications, datasets, collaborations, impact indications  Tie to funding information – grant codes, project title  Ideally, continually updated over project lifetime and beyond  Bring together stable accession numbers for data stored in public repositories, access to ‘locally stored dark data’, publications, summaries, SOPs, software and tools etc.  Do need to be findable, stable, searchable, maintained…..
  • 17. What to Keep, What to Share ?  A generic problem area and can be contentious  Keep all data required to inform a ‘result’  ‘Worth’ may not be immediately apparent to originator  Incidental datasets not directly linked to a publication are still getting lost  ‘Orphan’ datasets with no standard repository - ditto  Raw data with full metadata are required for re- purposing BUT often huge, requires specialist knowledge, tools to manipulate  Submission to repositories still requires EFFORT  Sometimes practical hurdles too great
  • 18. Challenges  Integrative science approaches repeatedly show that complete metadata are vital for optimal data reuse BUT  Still a complex time-consuming (often resented) task  Data fragmentation across multiple sites a major barrier to uptake (can’t find it… can’t use it…)  Practical aspects – research plans change, but data management plans static, cost of storage & curation, difficulty of funding post-project requirements, sheer volume of datasets and complexity of data submission  Intersection with emergent Institutional general policy
  • 19. Resources  Ten Simple Rules for the Care and Feeding of Scientific Data. Goodman et al (2014) PLOS Computational Biology 10 (4) e1003542  SEEK - http://www.seek4science.org/ (example open source data repository system)  www.blugen.org (example project-specific web-site)  http://www3.imperial.ac.uk/bioinfsupport/projects/systemsbiology/lola (single project research output example)  OMERO http://www.openmicroscopy.org/site/products/omero (example bio-image data management/sharing system)  MRIdb: Medical Image management for Biobank Research. Woodbridge et al (2013) J Digit. Imaging 26: 886-890.  http://www.biomedbridges.eu/news/principles-data-management-and- sharing-european-research-infrastructures
  • 20.  Infrastructure Systems Biology Europe http://project.isbe.eu/preparatory-phase/wps/wp2/  Elixir http://www.elixir-europe.org  Biomedbridges http://www.biomedbridges.eu/  Research Data Alliance https://rd-alliance.org/  Software Sustainability Institute http://www.software.ac.uk/  Biosharing http://biosharing.org/