SlideShare a Scribd company logo
1 of 79
Download to read offline
Data management and curation:
         the other side of bioinformatics

          Susanna-Assunta Sansone, PhD
          Principal Investigator and Team Leader,
     University of Oxford e-Research Centre, Oxford, UK

             http://uk.linkedin.com/in/sasansone


http://www.slideshare.net/SusannaSansone/B4OS-2012

              Bioinformatics for Omics Sciences (B4OS),
                     CNR Naples, 25-17 Sep 2012
Oxford e-Research Centre
Oxford e-Research Centre
Oxford e-Research Centre



             Providing research
             computing, high-
             performance
             computing
                      Integrating with
                      national and
                      international
                      infrastructure

             Supporting leading
             edge facilities through
             education and training
Oxford e-Research Centre


          Collaborating with European and wider
          international groups in, e.g.:
               •  energy,
               •  radio astronomy,
               •  biological data federation,
               •  life sciences simulation,
               •  biodiversity,
               •  computational chemistry,
               •  neuroscience,
               •  digital humanities tools,
               •  digital music analysis

          Research in
             •  computation,
             •  data infrastructure and analysis,
             •  visualisation
My team’s activities and groups we work with
data management, biocuration, development of software,
 databases and community-driven standards and ontology




  env	
                                                       agro	
  




        tox/pharma	
                             health	
  
http://www.flickr.com/photos/12308429@N03/4957994485/   CC BY
Today:
“The buzz around reproducible bioscience data -
the policies, the communities and the standards”


                   Thursday:
   “The reality from the buzz: how to deliver
         reproducible bioscience data”
Preserve
    institutional /
      corporate
       memory
Harmonize collection across sites
    Find matching studies
     Data dissemination
  Long-term data stewardship
                                    9
Utilize
public data

Identify suitable data
       Retrieve
Curate and harmonize
     Re-analyze


                         10
Address
reproducibility /
     reuse
 of public data


                    11
Address
reproducibility /
     reuse
 of public data


                    12
Address
reproducibility /
     reuse
 of public data

                    Ioannidis et al., Repeatability of published microarray
                    gene expression analyses. Nature Genetics 41(2),
                              13
                    149-55 (2009) doi:10.1038/ng.295
Address
reproducibility /
     reuse
 of public data
                          14



                    14
Address
reproducibility /
     reuse
 of public data

                          15



                    15
Address
reproducibility /
     reuse
 of public data
                          16



                    16
Growing, worldwide movement for reproducible research




     Shared, annotated research data and methods offer new discovery
         opportunities and prevent unnecessary repetition of work.
             Improved data sharing underpins science of the future
                                 “Publicly-funded research data are a public good,
                                    produced in the public interest”
                                 “Publicly-funded research data should be openly available
17
                                    to the maximum extent possible”
      The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
                                                          www.ebi.ac.uk/net-project
http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
Reproducible &
     Reusable
Bioscience Research
reasoning visualization
analysis browsing integration
   exchange retrieval



     Well-annotated &
     Structured Data


   Reproducible &
      Reusable
 Bioscience Research
reasoning visualization
            analysis browsing integration
               exchange retrieval

Community                                   Software
Standards                                    Tools
                 Well-annotated &
                 Structured Data


               Reproducible &
                  Reusable
             Bioscience Research
Today’s bioscience research
                                              Publications
  Experimental
      and
 computational
     data




§  Is interdisciplinary and integrative in character
    •  need to deal with new and existing datasets
    •  deal with a variety of data types
§  ‘How the organism works’ is the focus
    •    Twenty years ago data was the center
                                               Source of the figure: EBI website
Example from the toxicogenomics domain

                        Study looking at the effect of a
                        compound inducing liver damage
                        by characterizing/measuring
                        - the metabolic profile by MS and
                        NMR
                        - protein expression in liver by MS
                        - gene expression by DNA
                        microarray
                        -  conducting genetic and
                        phenotypical analysis
                        Information contributing to the
                        construction and validation of
                        system biology models
Example of experiments by
                                                                                                     InnoMed PredTox
24   The International Conference on Systems Biology (ICSB), 22-28 August, 2008                      a FP6 public-private consortium
                                                                                  Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Structured description of datasets




                       §  Capture all salient features
                           of the experimental workflow
                       §  Make annotation explicit and
                           discoverable
                       §  Structure the descriptions
                           for consistency, tracking
                            §  independent variables
                            §  dependent variables
                            using
                            §  cross reference and
                                resolvable identifiers
Not too much, not too little, just ‘right’




                          §  We must strike a balance
                              between
                               •  depth and breadth of
                                  information; and
                               •  sufficient information
                                  required to reuse the data
Information intensive experiments
Information intensive experiments


                     To make the experiments
                     comprehensible and reusable,
                     underpinning future
                     investigations, we need
                     common ways to report and
                     share the experimental details
                     and the associated data.

                     Consistent reporting will have a
                     positive and long-lasting impact
                     on the value of collective
                     scientific outputs.
Common ways to report and share

§ The challenges we face
  •    Large in volume: lots of data types and metadata!
  •    Lots of free text descriptions: hard to mine, subject to mistakes!
  •    Babel of terminologies: lack of definitions, hard to map!
  •    Heterogeneous file formats: software lock-in!
§ Need for reporting standards
  •  Minimal reporting descriptors
       - Report the same ‘core essentials’
  •  Controlled vocabularies or ontology
       - Use the same word and mean the same thing
  •  Common exchange formats
       - Make tools interoperable, allow data exchange and integration
Reporting standards – the benefits

§  Describe and communicate the information to others, in an
    unambiguous manner
§  To unlock the value in the data
   •  Compare, query and evaluate data
       - Facilitate scientific validation of the findings
   •  Understand variability within/between different technologies and
      protocols
       -  Facilitate technical validation
       -  Enable optimization of the experimental designs
       -  Identify critical checkpoints and develop quality metrics
§  To define submission and/or publication requirements
   •  Journals
   •  Databases
§  To ensure data integrity, reproducibility and (re)use
Escalating number of standardization efforts in bioscience,
                          e.g.:
                                                         Genomics Standards
Genome annotation                                         Consortium (GSC)
www.geneontology.org                                         gensc.org


  Functional                                                  Enzymology data
Genomics Data                                                    standards
Society (FGED)                                                 www.strenda.org
 www.fged.org

       HUPO- Proteomics
    Standards Initiative (PSI)                                   Systems modelling
      http://www.psidev.info                                         standards
                                                                    www.sbml.org
    Cheminformatics
   www.ebi.ac.uk/chebi
                                   Pathways
                                 www.biopax.org


                   Metabolomics Standards Initiative (MSI)
                      http://www.metabolomicssociety.org
Different community, different norms and standards, e.g.:




                                  use the same word and
        allow data to flow from                               report the same core,
                                  refer to the same ‘thing’
        one system to another                                 essential information
Different community, different norms and standards, e.g.:




                                  use the same word and
        allow data to flow from                               report the same core,
                                  refer to the same ‘thing’
        one system to another                                 essential information
Different community, different norms and standards, e.g.:




                                  use the same word and
        allow data to flow from                               report the same core,
                                  refer to the same ‘thing’
        one system to another                                 essential information




                        Challenges:
lack of coordination, fragmentation and uneven coverage
Is this ‘general mobilization’ good or bad?




                                      use the same word and
            allow data to flow from                               report the same core,
                                      refer to the same ‘thing’
            one system to another                                 essential information


§  Difference in structures and processes:
     •  organization types (open, close to members, society, WG…)
    •  standards development (how to design, develop, evaluate, maintain…)
    •  adoption, uptake, outreach (link to journals, funders, commercial sector…)
    •  funds (sponsors, memberships, grants, volunteering…)
Is this ‘general mobilization’ good or bad?




                                        use the same word and
              allow data to flow from                               report the same core,
                                        refer to the same ‘thing’
              one system to another                                 essential information


§  Fragmentation of the standards is a major issue
     •  Being focused on particular communities’ interests, be their individual
        technologies or biological/biomedical disciplines, leads to duplication of effort,
        and more seriously, the development of (largely arbitrarily) different standards
     •  This severely hinders the interoperability of databases and tools and ultimately
        the integration of datasets
Growing number of reporting standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
Growing number of reporting standards
                                                      + 303




                                                                                    + 150
                          + 130




                                                                                            Source: MIBBI,
                                                              Source: BioPortal




                                                                                                   EQUATOR
                                  Estimated




                                                                                                               Databases,
                                                                                                               annotation,
                                                                                                                curation
                                                                                                                  tools
                       MAGE-Tab!                AAO!                              miame!
                     GCDML!                                                            MIAPA!
                                                   CHEBI!
                       SRAxml!                  OBI!                              MIRIAM!
                                                     VO!
             SOFT!                                                                          MIQAS!
                   FASTA!                     PATO!                                   MIX!
      CML!                                              ENVO!                                      REMARK!
               DICOM!                                                                    MIGEN!
     GELML!                                    MOD!
                 SBRML!                                                               MIAPE!                 MIQE!
                                                     TEDDY!
 MITAB!     MzML!                             XAO!                                            CIMR! CONSORT!
                                                          BTO!
ISA-Tab! SEDML…!             DO                PRO!       IDO…!                             MIASE! MISFISHIE….!
But how much do we know about these standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
But how much do we know about these standards
            Which tools and     I use high throughput
              databases       sequencing technologies,
           implement which    which one are applicable
              standards?                to me?

                                            How can I get
    What are the
                                             involved to
criteria to evaluate
                                               propose
 their status and
                                            extensions or
       value?
                                            modifications?



          Which one are              I work on plants,
         mature enough for           are these just for
           me to use or                 biomedical
           recommend?                  applications?
But how much do we know about these standards

§  A bewildering array of standards is available, but
   •  these are hard to find, at different levels of maturity; in
      some areas duplications or gaps in coverage also exist

§  Standards are just a ‘means to an end’, therefore
   •  we want to make them discoverable and accessible,
      maximizing their use to assist the virtuous data cycle,
      from generation to standardization through publication to
      subsequent sharing and reuse
A catalogue to map the
                                                                                  landscape of standards and the
                                                                                  systems implementing them:
                                                                                  Over 400 bio-standards
                                                                                  (public and in curation)
                                                                                        Field*, Sansone* et al., Omics data sharing. Science
42   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                        326, 234-36 (2009) doi:0.1126/science.1180598
                                                                                    www.ebi.ac.uk/net-project
•    A coherent, curated and searchable catalogue of data sharing resources
•    Bioscience standards and associated data-sharing policies, publications, tools and databases
•    Assessment criteria for usability and popularity of standards
•    Relationships among standards
•    Encouragement for communication & interaction among groups
•    Promoting interoperability & informed decisions about standards
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Example of multi-assays study – how many ‘standards’
                are applicable to this?
Smith et al, 2007




The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                               www.ebi.ac.uk/net-project
Smith et al, 2007




Taylor, Field, Sansone et al, 2008

    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                   www.ebi.ac.uk/net-project
List of databases, linked to standards a collaboration with                                                 Database Issue




50   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




51   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




52   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
Major challenge: define ‘relations’ among standards




                                                                                                                CREDIT:
 The relationship among popular standard formats for pathway information                                        Demir, et al., The BioPAX
 BioPAX and PSI-MI are designed for data exchange to and from databases and                                     community standard for
 pathway and network data integration. SBML and CellML are designed to                                          pathway data sharing,
 support mathematical simulations of biological systems and SBGN represents                                     2010.
 pathway diagrams.
53   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
This is not just a technical but also
                                   a social engineering challenge!




55   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Ownership of open standards
                                          can be problematic in broad,
                                           grass-root collaborations; it
                                          requires improved models, to
                                        encourage maintenance of and
                                         contributions to these efforts,
                                           supporting their evolutions




56   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
The extensive ‘social
                                         engineering’ and community
                                         liaison needs to be managed
                                            and funded; rewards and
                                        incentives need to be identified
                                               for all contributors




57   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
http://www.flickr.com/photos/idiolector/289490834/   CC BY
The cost of implementing a
                                           standards-supported data
                                        sharing vision is as large as the
                                         number of stakeholders that
                                         must operate synchronously




60   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
1. Funders actively developing data policies




§  Several data preservation, management and sharing policies have
    emerged in response to increased funding for omics domains
§  Even if in general terms, standards are recognized as necessary ‘tools’ to
    unambiguously represent, describe and communicate research data
2. Similar trend in the regulatory arena




§  “… lack of standardized data affects CDER’s review processes by curtailing a
    reviewer’s ability to perform integral tasks such as rapid acquisition, storage,
    analysis......efficient management of a portfolio of standards projects will
    require coordinated efforts and clear roles for multiple participants within/outside
    FDA”
3. Publishes have become strong advocators




§  Continue to support the development of open standards and tools
     •  to support sharing of sufficiently well annotated datasets
65   •  to enable comprehensible, reusable, www.ebi.ac.uk/net-project research
                                             reproducible
     The International Conference on Systems Biology (ICSB), 22-28 August, 2008
                                          Susanna-Assunta Sansone
….the rise of data-driven journals, e.g.:




                                        partnering with:
The rise of data-driven journals, e.g.:




                                          partnering with:
4. Similar trend in the commercial sector




§  R&D has invested heavily in procedures and tools that integrate external
    information with their own data to enhance the decision-making process
•  Now joining forces to streamline non-competitive elements of the life
   science workflow by the specification of common standards, business
   terms, relationships and processes
....their information landscape is evolving


     Yesterday                                Today                                         Tomorrow
                                                                                                         Proprietary
                                                                           Public                         content
                                                                          content                         provider
                                                                          provider

        Big Life
        Science                                 Big Life                 CRO
                                                                                                                  Academic
       Company                                  Science
                                                                                                                  group
                                               Company
                                                                        Regulatory
                                                                        authorities

                                                                             Service provider
                                                                                                Software vendor

               Yesterday                     Today                         Tomorrow
Innovation     Innovation inside             Searching for Innovation      Heterogeneity of collaborations; part of
                                                                           the wider ecosystem
Model
IT             Internal apps & data          Struggling with change        Cloud, services
                                             security and trust

Data           Mostly inside                 In and out                    Distributed

Portfolio      Internally driven and owned   Partially shared              Shared portfolio



                                                                                      Credit to: Pistoia Alliance
Take home messages


u  Contribute to the reproducible research movement


u  Think about data management as a career path


u  Learn more about open community-standards


u  Get involved, e.g.:
                                          Open
                                          Bioinformatics
                                          Foundation
Data is not like a $ bill….




http://www.flickr.com/photos/jackofspades/4500411648/ CC BY
Your research and all (publicly
                               funded) research should make
                                     make an … impact




      http://www.flickr.com/photos/equinoxefr/2620239993/                                                       CC BY
73   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
…..the biggest possible impact!




     http://www.flickr.com/photos/webhamster/2582189977/                                                        CC BY

74   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Today:
“The buzz around reproducible bioscience data -
the policies, the communities and the standards”


                   Thursday:
   “The reality from the buzz: how to deliver
         reproducible bioscience data”
Is it possible to achieve a common, structured
representation of diverse bioscience experiments that:
•  follows the appropriate community standards and
•  delivers richly-annotated datasets?
Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
Increasing level of structure




Notes in Lab Books       Spreadsheets and Tables                         Facts as RDF statements
(information for humans) ( the compromise)                               (information for machines)

               TOWARDS INTEROPERABLE BIOSCIENCE DATA                            doi:10.1038/ng.1054

               Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann
               S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B,
               Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S,
               Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland
               L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A,
  Feb 2012
               Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B,
 www.biosharing.org                                                   www.isacommons.org
               Wolstencroft K, Xenarios J, Hide W.
                                                             www.isacommons.org
References
1. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K,
Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA,
Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of
ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251-1255 (2007)
2. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA,
Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J,
Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK Jr,
Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novère N, et al.: Promoting coherent
minimum reporting guidelines for biological and biomedical investigations: the MIBBI project.
Nat Biotechnol 26(8):889-896 (2008)
3. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P,
Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-
Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J: Megascience. 'Omics data sharing.
Science 326(5950):234-236 (2009)
4. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M,
Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise
J, Dix I: Empowering industrial research with shared biomedical vocabularies. Drug Discov
Today 16(21-22):940-947 (2011)
5. Sansone SA and Rocca-Serra P: On the evolving portfolio of community-standards and data
sharing policies: turning challenges into new opportunities. GigaScience 1:10 (2012)

More Related Content

What's hot

A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
Thitichai Sripan
 
I NTRODUCTION.doc
I NTRODUCTION.docI NTRODUCTION.doc
I NTRODUCTION.doc
butest
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 

What's hot (20)

Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
NRNB Annual Report 2011
NRNB Annual Report 2011NRNB Annual Report 2011
NRNB Annual Report 2011
 
Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformatics
 
Plant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural networkPlant leaf identification system using convolutional neural network
Plant leaf identification system using convolutional neural network
 
Description & annotation of biomedical experimental data sets: work in p...
Description & annotation of biomedical experimental data sets:  work in p...Description & annotation of biomedical experimental data sets:  work in p...
Description & annotation of biomedical experimental data sets: work in p...
 
Itbi
ItbiItbi
Itbi
 
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
 
I NTRODUCTION.doc
I NTRODUCTION.docI NTRODUCTION.doc
I NTRODUCTION.doc
 
275
275275
275
 
Lurking in the lab: analysis of data from molecular biology laboratory instr...
Lurking in the lab:  analysis of data from molecular biology laboratory instr...Lurking in the lab:  analysis of data from molecular biology laboratory instr...
Lurking in the lab: analysis of data from molecular biology laboratory instr...
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
 
Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
 
Friend NAS 2013-01-10
Friend NAS 2013-01-10Friend NAS 2013-01-10
Friend NAS 2013-01-10
 
Introduction to Bioinformatics Slides
Introduction to Bioinformatics SlidesIntroduction to Bioinformatics Slides
Introduction to Bioinformatics Slides
 

Viewers also liked

Integrative_omics_lecture_feb112016_UAB
Integrative_omics_lecture_feb112016_UABIntegrative_omics_lecture_feb112016_UAB
Integrative_omics_lecture_feb112016_UAB
Sophia Banton
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
COST action BM1006
 
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
pratikomics
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 

Viewers also liked (12)

Usability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challengesUsability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challenges
 
Integrative_omics_lecture_feb112016_UAB
Integrative_omics_lecture_feb112016_UABIntegrative_omics_lecture_feb112016_UAB
Integrative_omics_lecture_feb112016_UAB
 
BPIPE: a bioinformatics pipeline framework
BPIPE: a bioinformatics pipeline frameworkBPIPE: a bioinformatics pipeline framework
BPIPE: a bioinformatics pipeline framework
 
Multi-omics Pathway Visualization
Multi-omics Pathway VisualizationMulti-omics Pathway Visualization
Multi-omics Pathway Visualization
 
The Ondex Data Integration Framework
The Ondex Data Integration FrameworkThe Ondex Data Integration Framework
The Ondex Data Integration Framework
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
integration_Aug2015
integration_Aug2015integration_Aug2015
integration_Aug2015
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
The Galaxy framework as a unifying bioinformatics solution for multi-omic dat...
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 

Similar to B4OS-2012

Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
Monica Munoz-Torres
 

Similar to B4OS-2012 (20)

Sabina Leonelli
Sabina LeonelliSabina Leonelli
Sabina Leonelli
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
NRNB EAC Report 2011
NRNB EAC Report 2011NRNB EAC Report 2011
NRNB EAC Report 2011
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 

More from Susanna-Assunta Sansone

More from Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

B4OS-2012

  • 1. Data management and curation: the other side of bioinformatics Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader, University of Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone http://www.slideshare.net/SusannaSansone/B4OS-2012 Bioinformatics for Omics Sciences (B4OS), CNR Naples, 25-17 Sep 2012
  • 4. Oxford e-Research Centre Providing research computing, high- performance computing Integrating with national and international infrastructure Supporting leading edge facilities through education and training
  • 5. Oxford e-Research Centre Collaborating with European and wider international groups in, e.g.: •  energy, •  radio astronomy, •  biological data federation, •  life sciences simulation, •  biodiversity, •  computational chemistry, •  neuroscience, •  digital humanities tools, •  digital music analysis Research in •  computation, •  data infrastructure and analysis, •  visualisation
  • 6. My team’s activities and groups we work with data management, biocuration, development of software, databases and community-driven standards and ontology env   agro   tox/pharma   health  
  • 8. Today: “The buzz around reproducible bioscience data - the policies, the communities and the standards” Thursday: “The reality from the buzz: how to deliver reproducible bioscience data”
  • 9. Preserve institutional / corporate memory Harmonize collection across sites Find matching studies Data dissemination Long-term data stewardship 9
  • 10. Utilize public data Identify suitable data Retrieve Curate and harmonize Re-analyze 10
  • 11. Address reproducibility / reuse of public data 11
  • 12. Address reproducibility / reuse of public data 12
  • 13. Address reproducibility / reuse of public data Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 13 149-55 (2009) doi:10.1038/ng.295
  • 14. Address reproducibility / reuse of public data 14 14
  • 15. Address reproducibility / reuse of public data 15 15
  • 16. Address reproducibility / reuse of public data 16 16
  • 17. Growing, worldwide movement for reproducible research Shared, annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work. Improved data sharing underpins science of the future “Publicly-funded research data are a public good, produced in the public interest” “Publicly-funded research data should be openly available 17 to the maximum extent possible” The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 19. Reproducible & Reusable Bioscience Research
  • 20. reasoning visualization analysis browsing integration exchange retrieval Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  • 21. reasoning visualization analysis browsing integration exchange retrieval Community Software Standards Tools Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  • 22. Today’s bioscience research Publications Experimental and computational data §  Is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types §  ‘How the organism works’ is the focus •  Twenty years ago data was the center Source of the figure: EBI website
  • 23. Example from the toxicogenomics domain Study looking at the effect of a compound inducing liver damage by characterizing/measuring - the metabolic profile by MS and NMR - protein expression in liver by MS - gene expression by DNA microarray -  conducting genetic and phenotypical analysis Information contributing to the construction and validation of system biology models
  • 24. Example of experiments by InnoMed PredTox 24 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 25. Structured description of datasets §  Capture all salient features of the experimental workflow §  Make annotation explicit and discoverable §  Structure the descriptions for consistency, tracking §  independent variables §  dependent variables using §  cross reference and resolvable identifiers
  • 26. Not too much, not too little, just ‘right’ §  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data
  • 28. Information intensive experiments To make the experiments comprehensible and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated data. Consistent reporting will have a positive and long-lasting impact on the value of collective scientific outputs.
  • 29. Common ways to report and share § The challenges we face •  Large in volume: lots of data types and metadata! •  Lots of free text descriptions: hard to mine, subject to mistakes! •  Babel of terminologies: lack of definitions, hard to map! •  Heterogeneous file formats: software lock-in! § Need for reporting standards •  Minimal reporting descriptors - Report the same ‘core essentials’ •  Controlled vocabularies or ontology - Use the same word and mean the same thing •  Common exchange formats - Make tools interoperable, allow data exchange and integration
  • 30. Reporting standards – the benefits §  Describe and communicate the information to others, in an unambiguous manner §  To unlock the value in the data •  Compare, query and evaluate data - Facilitate scientific validation of the findings •  Understand variability within/between different technologies and protocols -  Facilitate technical validation -  Enable optimization of the experimental designs -  Identify critical checkpoints and develop quality metrics §  To define submission and/or publication requirements •  Journals •  Databases §  To ensure data integrity, reproducibility and (re)use
  • 31. Escalating number of standardization efforts in bioscience, e.g.: Genomics Standards Genome annotation Consortium (GSC) www.geneontology.org gensc.org Functional Enzymology data Genomics Data standards Society (FGED) www.strenda.org www.fged.org HUPO- Proteomics Standards Initiative (PSI) Systems modelling http://www.psidev.info standards www.sbml.org Cheminformatics www.ebi.ac.uk/chebi Pathways www.biopax.org Metabolomics Standards Initiative (MSI) http://www.metabolomicssociety.org
  • 32. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  • 33. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  • 34. Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information Challenges: lack of coordination, fragmentation and uneven coverage
  • 35. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information §  Difference in structures and processes: •  organization types (open, close to members, society, WG…) •  standards development (how to design, develop, evaluate, maintain…) •  adoption, uptake, outreach (link to journals, funders, commercial sector…) •  funds (sponsors, memberships, grants, volunteering…)
  • 36. Is this ‘general mobilization’ good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information §  Fragmentation of the standards is a major issue •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  • 37. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 38. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 39. But how much do we know about these standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 40. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved to criteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  • 41. But how much do we know about these standards §  A bewildering array of standards is available, but •  these are hard to find, at different levels of maturity; in some areas duplications or gaps in coverage also exist §  Standards are just a ‘means to an end’, therefore •  we want to make them discoverable and accessible, maximizing their use to assist the virtuous data cycle, from generation to standardization through publication to subsequent sharing and reuse
  • 42. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science 42 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • 43. •  A coherent, curated and searchable catalogue of data sharing resources •  Bioscience standards and associated data-sharing policies, publications, tools and databases •  Assessment criteria for usability and popularity of standards •  Relationships among standards •  Encouragement for communication & interaction among groups •  Promoting interoperability & informed decisions about standards
  • 44. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 45. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 46. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 47. Example of multi-assays study – how many ‘standards’ are applicable to this?
  • 48. Smith et al, 2007 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 49. Smith et al, 2007 Taylor, Field, Sansone et al, 2008 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 50. List of databases, linked to standards a collaboration with Database Issue 50 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 51. List of databases, linked to standards a collaboration with Database Issue 51 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 52. List of databases, linked to standards a collaboration with Database Issue 52 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 53. Major challenge: define ‘relations’ among standards CREDIT: The relationship among popular standard formats for pathway information Demir, et al., The BioPAX BioPAX and PSI-MI are designed for data exchange to and from databases and community standard for pathway and network data integration. SBML and CellML are designed to pathway data sharing, support mathematical simulations of biological systems and SBGN represents 2010. pathway diagrams. 53 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 54.
  • 55. This is not just a technical but also a social engineering challenge! 55 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 56. Ownership of open standards can be problematic in broad, grass-root collaborations; it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions 56 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 57. The extensive ‘social engineering’ and community liaison needs to be managed and funded; rewards and incentives need to be identified for all contributors 57 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 59.
  • 60. The cost of implementing a standards-supported data sharing vision is as large as the number of stakeholders that must operate synchronously 60 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 61. 1. Funders actively developing data policies §  Several data preservation, management and sharing policies have emerged in response to increased funding for omics domains §  Even if in general terms, standards are recognized as necessary ‘tools’ to unambiguously represent, describe and communicate research data
  • 62.
  • 63. 2. Similar trend in the regulatory arena §  “… lack of standardized data affects CDER’s review processes by curtailing a reviewer’s ability to perform integral tasks such as rapid acquisition, storage, analysis......efficient management of a portfolio of standards projects will require coordinated efforts and clear roles for multiple participants within/outside FDA”
  • 64.
  • 65. 3. Publishes have become strong advocators §  Continue to support the development of open standards and tools •  to support sharing of sufficiently well annotated datasets 65 •  to enable comprehensible, reusable, www.ebi.ac.uk/net-project research reproducible The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 66. ….the rise of data-driven journals, e.g.: partnering with:
  • 67.
  • 68. The rise of data-driven journals, e.g.: partnering with:
  • 69. 4. Similar trend in the commercial sector §  R&D has invested heavily in procedures and tools that integrate external information with their own data to enhance the decision-making process •  Now joining forces to streamline non-competitive elements of the life science workflow by the specification of common standards, business terms, relationships and processes
  • 70. ....their information landscape is evolving Yesterday Today Tomorrow Proprietary Public content content provider provider Big Life Science Big Life CRO Academic Company Science group Company Regulatory authorities Service provider Software vendor Yesterday Today Tomorrow Innovation Innovation inside Searching for Innovation Heterogeneity of collaborations; part of the wider ecosystem Model IT Internal apps & data Struggling with change Cloud, services security and trust Data Mostly inside In and out Distributed Portfolio Internally driven and owned Partially shared Shared portfolio Credit to: Pistoia Alliance
  • 71. Take home messages u  Contribute to the reproducible research movement u  Think about data management as a career path u  Learn more about open community-standards u  Get involved, e.g.: Open Bioinformatics Foundation
  • 72. Data is not like a $ bill…. http://www.flickr.com/photos/jackofspades/4500411648/ CC BY
  • 73. Your research and all (publicly funded) research should make make an … impact http://www.flickr.com/photos/equinoxefr/2620239993/ CC BY 73 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 74. …..the biggest possible impact! http://www.flickr.com/photos/webhamster/2582189977/ CC BY 74 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 75. Today: “The buzz around reproducible bioscience data - the policies, the communities and the standards” Thursday: “The reality from the buzz: how to deliver reproducible bioscience data”
  • 76. Is it possible to achieve a common, structured representation of diverse bioscience experiments that: •  follows the appropriate community standards and •  delivers richly-annotated datasets?
  • 77. Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
  • 78. Increasing level of structure Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines) TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Feb 2012 Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, www.biosharing.org www.isacommons.org Wolstencroft K, Xenarios J, Hide W. www.isacommons.org
  • 79. References 1. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251-1255 (2007) 2. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novère N, et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8):889-896 (2008) 3. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca- Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J: Megascience. 'Omics data sharing. Science 326(5950):234-236 (2009) 4. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I: Empowering industrial research with shared biomedical vocabularies. Drug Discov Today 16(21-22):940-947 (2011) 5. Sansone SA and Rocca-Serra P: On the evolving portfolio of community-standards and data sharing policies: turning challenges into new opportunities. GigaScience 1:10 (2012)