SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
bioscience
The ISA Commons: experiences from! field
                                  the

                    Susanna-Assunta Sansone, PhD

                      Principal Investigator, Team Leader,
                   University of Oxford e-Research Centre,
                                     Oxford, UK

                      http://uk.linkedin.com/in/sasansone
                                    #biosharing

                             DataCite Summer Meeting
  DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use
                            June 14, 2012 Copenhagen
•  Reproducible research
    •  annotated research data and methods offer new
       discovery opportunities and prevent unnecessary
       repetition of work;
    •  improved data sharing underpins science of the future;
    •  but !.. shared data have little or no value if they are
       not interpretable and, consequently, reusable



                                                Image from datacite.org
Reproducibility



                  Ioannidis et al., Repeatability of published microarray
                  gene expression analyses. Nature Genetics 41(2),
                             3!
                  149-55 (2009) doi:10.1038/ng.295
Reproducibility



                  Ioannidis et al., Repeatability of published microarray
                  gene expression analyses. Nature Genetics 41(2),
                             4!
                  149-55 (2009) doi:10.1038/ng.295
Reproducibility



                  Ioannidis et al., Repeatability of published microarray
                  gene expression analyses. Nature Genetics 41(2),
                             5!
                  149-55 (2009) doi:10.1038/ng.295
Reproducibility



                       6!



                  6!
Across studies
 and groups



                      7!



                 7!
Reproducibility



                       8!



                  8!
NO to ‘data blobs’

YES to verifiable, complete
and structured information



                             Image from datacite.org
Structured description of datasets




                       !  Capture all salient features
                          of the experimental workflow
                       !  Make annotation explicit and
                          discoverable
                       !  Structure the descriptions
                          for consistency, tracking
                           !  independent variables
                           !  dependent variables
                           using
                           !  cross reference and
                               resolvable identifiers
Not too much, not too little, just ‘right’




                          !  We must strike a balance
                             between
                              •  depth and breadth of
                                 information; and
                              •  sufficient information
                                 required to reuse the data
Example of experiments by
                                                                                                     InnoMed PredTox
12   The International Conference on Systems Biology (ICSB), 22-28 August, 2008                      a FP6 public-private consortium
                                                                                  Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Different community, different norms and standards, e.g.:




                                   use the same word and
         allow data to flow from                               report the same core,
                                   refer to the same ‘thing’
         one system to another                                 essential information




  Challenges: lack of coordination, fragmentation and uneven coverage
Growing number of reporting standards
                                                      + 303




                                                                                   + 150
                          + 130




                                                                                           Source: MIBBI,
                                                              Source: BioPortal




                                                                                                  EQUATOR
                                  Estimated




                       MAGE-Tab!                AAO!                              MIAME!
                     GCDML!                                                           MIAPA!
                                                   CHEBI!
                       SRAxml!                  OBI!                              MIRIAM!
                                                     VO!
             SOFT!                                                                         MIQAS!
                   FASTA!                     PATO!                                  MIX!
      CML!                                              ENVO!                                     REMARK!
               DICOM!                                                                   MIGEN!
     GELML!                                    MOD!
                 SBRML!                                                              MIAPE!                 MIQE!
                                                     TEDDY!
 MITAB!     MzML!                             XAO!                                           CIMR! CONSORT!
                                                          BTO!
ISA-Tab! SEDML…!             DO                PRO!       IDO…!                            MIASE! MISFISHIE….!
A catalogue to map the
                                                                                  landscape of standards and the
                                                                                  systems implementing them:
                                                                                  Over 400 bio-standards
                                                                                  (public and in curation)
                                                                                        Field*, Sansone* et al., Omics data sharing. Science
15   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                        326, 234-36 (2009) doi:0.1126/science.1180598
                                                                                    www.ebi.ac.uk/net-project
A catalogue to map the
                                                                                  landscape of standards and the
                                                                                  systems implementing them:
                                                                                  Over 400 bio-standards
                                                                                  (public and in curation)
                                                                                        Field*, Sansone* et al., Omics data sharing. Science
16   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                        326, 234-36 (2009) doi:0.1126/science.1180598
                                                                                    www.ebi.ac.uk/net-project
Bioscience is not one domain!




                                                                &+'.!&*




                 +,-*                                                     '/("*


                                !"#$%&'()'*

!  Bioscience is interdisciplinary and integrative in character
    •  need to deal with new and existing datasets
    •  deal with a variety of data types
                                              Source of the figure: EBI website
Is it possible to achieve a common, structured
representation of diverse bioscience experiments that:
•  transcends individual bioscience domains, but also
•  follows the appropriate community norms and
   standards?
A growing ecosystem of over 30 public and internal resources
         using the ISA metadata tracking framework to facilitate standards-
         compliant collection, curation, management and reuse of
         investigations in an increasingly diverse set of life science domains,
         including:
         •  environmental health         •  stem cell discovery
         •  environmental genomics       •  system biology
         •  metabolomics                 •  transcriptomics
         •  metagenomics                 •  toxicogenomics
         •  nanotechnology               •  also by communities working to build
         •  proteomics,                     a library of cellular signatures




      We aim to achieve a common
representation of experimental content that
transcends individual bioscience domains

                                                 Sansone et al., Towards interoperable
                                                 bioscience data. Nature Genetics 44,
                                                 121-126 (2012) doi:10.1038/ng.1054
A growing ecosystem of over 30 public and internal resources
                          using the ISA metadata tracking framework to facilitate standards-
                          compliant collection, curation, management and reuse of
                          investigations in an increasingly diverse set of life science domains,
                          including:
                          •  environmental health         •  stem cell discovery
                          •  environmental genomics       •  system biology
                          •  metabolomics                 •  transcriptomics
                          •  metagenomics                 •  toxicogenomics
                          •  nanotechnology               •  also by communities working to build
                          •  proteomics                      a library of cellular signatures


                      Some of the public groups/resources:                Some of the internal projects:




                                    Stem Cell Commons




  Nanotechnology
Informatics Working
      Group
A growing ecosystem of over 30 public and internal resources
                          using the ISA metadata tracking framework to facilitate standards-
                          compliant collection, curation, management and reuse of
                          investigations in an increasingly diverse set of life science domains,
                          including:
                          •  environmental health         •  stem cell discovery
                          •  environmental genomics       •  system biology
                          •  metabolomics                 •  transcriptomics
                          •  metagenomics                 •  toxicogenomics
                          •  nanotechnology               •  also by communities working to build
                          •  proteomics                      a library of cellular signatures


                      Some of the public groups/resources:                Some of the internal projects:




                                    Stem Cell Commons




  Nanotechnology
Informatics Working
      Group
Metadata tracking framework, designed to
support the use us several standards
checklists, terminologies conversions to
(a growing number of) other metadata
formats, used by public repositories, e.g.

      MAGE-Tab     Pride-xml

                      SRA-xml      SOFT


Currently finalizing conversion to RDF to
explore the growing Linked Data universe,
in collaboration with the W3C HCLSIG)
empowering researchers to use standards




                                                                                                        To mint DOIs




23   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
TOWARDS INTEROPERABLE BIOSCIENCE DATA                                               doi:10.1038/ng.1054

               Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann
               S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B,
               Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S,
               Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland
               L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A,
  Feb 2012
               Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,
 www.biosharing.org                                                   www.isacommons.org
               Wolstencroft K, Xenarios J, Hide W.         www.isacommons.org

Community involvement and uptake!

1st ISA-Tab workshop! 3rd ISA-Tab workshop!      User workshops/visits - start!   1st public instance: !                           !
       2nd ISA-Tab workshop!                              Other tools implement ! Harvard Stem Cell ! Growing number of
                                                          ISA-Tab!                Discovery Engine! systems starts to adopt
                                                                                                         ISA-Tab!


Core developments!
                                                                                  Conversions to !                Links to
                                                                                  Pride-XML/SRA-XML/!             analysis tools
Strawman ISA-Tab spec!                            ISA software v1!                MAGE-Tab and more!              starts!
                      Final ISA-Tab spec!            Database instance !
                                                     at EBI!                                      RDF format starts!

Publications!
                                                                                                       Stem Cell !
                                                                           ISA-Tab and !               Discovery ! ISA Commons!
                                               Omics data sharing!
            Workshop reports!                                              ISA software suite!         Engine!
                                              (Science)!                                                           (Nature Genetics)!
                                                                           (Bioinformatics)!           (NAR)!


2007    2008       2009                                              2010                        2011                    2012
Development timeline!

Mais conteúdo relacionado

Semelhante a Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Web services based workflows to deal with 3D data
Web services based workflows to deal with 3D dataWeb services based workflows to deal with 3D data
Web services based workflows to deal with 3D data
Jose Enrique Ruiz
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
Moore_Arts_in_Psychotherapy_2017
Moore_Arts_in_Psychotherapy_2017 Moore_Arts_in_Psychotherapy_2017
Moore_Arts_in_Psychotherapy_2017
Sidonie Kilpatrick
 
Coordination in Virtual Organizations of Research and Development
Coordination in Virtual Organizations of Research and DevelopmentCoordination in Virtual Organizations of Research and Development
Coordination in Virtual Organizations of Research and Development
Sociotechnical Roundtable
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
Jose Enrique Ruiz
 

Semelhante a Susanna Sansone at DataCite: The ISA-Commons - experiences from the field (20)

Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharin...
Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharin...Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharin...
Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharin...
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
BCU 2013
BCU 2013BCU 2013
BCU 2013
 
Web services based workflows to deal with 3D data
Web services based workflows to deal with 3D dataWeb services based workflows to deal with 3D data
Web services based workflows to deal with 3D data
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
The future of scientific information & communication
 
Moore_Arts_in_Psychotherapy_2017
Moore_Arts_in_Psychotherapy_2017 Moore_Arts_in_Psychotherapy_2017
Moore_Arts_in_Psychotherapy_2017
 
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
Asgn 3.9 pp decomposition inquiry lab 120620
Asgn 3.9 pp decomposition inquiry lab 120620Asgn 3.9 pp decomposition inquiry lab 120620
Asgn 3.9 pp decomposition inquiry lab 120620
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
Who We Are: Analysis of 10 years of EDMedia
Who We Are:  Analysis of 10 years of EDMediaWho We Are:  Analysis of 10 years of EDMedia
Who We Are: Analysis of 10 years of EDMedia
 
Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...
 
Coordination in Virtual Organizations of Research and Development
Coordination in Virtual Organizations of Research and DevelopmentCoordination in Virtual Organizations of Research and Development
Coordination in Virtual Organizations of Research and Development
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 

Mais de GigaScience, BGI Hong Kong

Mais de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

  • 1. bioscience The ISA Commons: experiences from! field the Susanna-Assunta Sansone, PhD Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone #biosharing DataCite Summer Meeting DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use June 14, 2012 Copenhagen
  • 2. •  Reproducible research •  annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work; •  improved data sharing underpins science of the future; •  but !.. shared data have little or no value if they are not interpretable and, consequently, reusable Image from datacite.org
  • 3. Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 3! 149-55 (2009) doi:10.1038/ng.295
  • 4. Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 4! 149-55 (2009) doi:10.1038/ng.295
  • 5. Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 5! 149-55 (2009) doi:10.1038/ng.295
  • 7. Across studies and groups 7! 7!
  • 9. NO to ‘data blobs’ YES to verifiable, complete and structured information Image from datacite.org
  • 10. Structured description of datasets !  Capture all salient features of the experimental workflow !  Make annotation explicit and discoverable !  Structure the descriptions for consistency, tracking !  independent variables !  dependent variables using !  cross reference and resolvable identifiers
  • 11. Not too much, not too little, just ‘right’ !  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data
  • 12. Example of experiments by InnoMed PredTox 12 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 13. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information Challenges: lack of coordination, fragmentation and uneven coverage
  • 14. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated MAGE-Tab! AAO! MIAME! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 15. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science 15 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • 16. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science 16 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • 17. Bioscience is not one domain! &+'.!&* +,-* '/("* !"#$%&'()'* !  Bioscience is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types Source of the figure: EBI website
  • 18. Is it possible to achieve a common, structured representation of diverse bioscience experiments that: •  transcends individual bioscience domains, but also •  follows the appropriate community norms and standards?
  • 19. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics, a library of cellular signatures We aim to achieve a common representation of experimental content that transcends individual bioscience domains Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
  • 20. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics a library of cellular signatures Some of the public groups/resources: Some of the internal projects: Stem Cell Commons Nanotechnology Informatics Working Group
  • 21. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics a library of cellular signatures Some of the public groups/resources: Some of the internal projects: Stem Cell Commons Nanotechnology Informatics Working Group
  • 22. Metadata tracking framework, designed to support the use us several standards checklists, terminologies conversions to (a growing number of) other metadata formats, used by public repositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFT Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG)
  • 23. empowering researchers to use standards To mint DOIs 23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 24. TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Feb 2012 Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, www.biosharing.org www.isacommons.org Wolstencroft K, Xenarios J, Hide W. www.isacommons.org Community involvement and uptake! 1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA-Tab! Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis tools Strawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts! Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)! 2007 2008 2009 2010 2011 2012 Development timeline!