O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 30 Anúncio

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

Baixar para ler offline

Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.

http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop

Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.

http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Metadata challenges research and re-usable data - BioSharing, ISA and STATO (20)

Anúncio

Mais de Alejandra Gonzalez-Beltran (20)

Mais recentes (20)

Anúncio

Metadata challenges research and re-usable data - BioSharing, ISA and STATO

  1. 1. Metadata challenges of reproducible research and re-usable data BioSharing, ISA and STATO examples Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of Oxford alejandra.gonzalezbeltran@oerc.ox.ac.uk @alegonbel OpenData & Reproducibility workshop: the Good Scientist in the Open Science era 21st April 2015 British Ecological Society, UK
  2. 2. Reproducible  &  Reusable     Bioscience  Research Well-­‐annotated  &   Structured  Data
  3. 3. Reproducible  &  Reusable     Bioscience  Research Well-­‐annotated  &   Structured  Data reasoning analysis exchange integration visualization browsing retrieval Community  Standards Software  Tools
  4. 4. Reproducible  &  Reusable     Bioscience  Research Well-­‐annotated  &   Structured  Data reasoning analysis exchange integration visualization browsing retrieval Community  Standards Software  Tools
  5. 5. A community mobilization to develop standards, e.g.: !  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering) de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  6. 6. Types of reporting standards Nanotechnology Working Group Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’ Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another
  7. 7. A web-based, curated and searchable registry ensuring that standards and databases are registered, informative and discoverable; also monitoring the development and evolution of standards, their use in databases and the adoption of both in data policies. Launched Jan 2011
  8. 8. Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them; Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented Goal: assist stakeholders to make informed decisions
  9. 9. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project Core functionalities: • search and filtering, e.g. by funder • submissions forms to add new records • “claim” functionality of existing records • person’s profile (as maintainer of records) associated to the ORCID profile (for credit, as incentive) • visualization and views of content Search, filter, submit, claim, view and more
  10. 10. Curated crowdsourcing approach
  11. 11. Formats & Database Fragmentation
  12. 12. 14 ) infrastructureThe Investigation/Study/Assay ( generic format for experimental description and data exchange open source software toolscommunity engagement
  13. 13. investigation assay(s) assay(s) data data external files in native or other for- mats pointers to data file names/location investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub- ject or on the whole initial subject, which produce quali- tative or quantitative meas- urements (data) • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics • stem cell discovery • system biology • transcriptomics • toxicogenomics • communities working to build a library of cellular signatures
  14. 14. investigation assay(s) assay(s) data data external files in native or other for- mats pointers to data file names/location investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub- ject or on the whole initial subject, which produce quali- tative or quantitative meas- urements (data) • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics • stem cell discovery • system biology • transcriptomics • toxicogenomics • communities working to build a library of cellular signatures
  15. 15. The experimental plan experimental design! sample characteristic(s)! experimental variable(s)! 2-week systemic rat study using male Wistar rats (N=15 per dose group) 14 proprietary drug candidates from participating companies and 2 reference toxic compounds InnoMed PredTox Project
  16. 16. The experimental plan experimental design! sample characteristic(s)! experimental variable(s)! technology(s)! measurement(s)! protocols(s)! data file(s)! …!
  17. 17. http://dx.doi.org/10.5524/100063 investigation study
  18. 18. http://www.nature.com/search?journal=sdata&q=ecology http://www.nature.com/articles/sdata201513 http://www.nature.com/articles/sdata20158
  19. 19. 23
  20. 20. 24 http://isa-tools.github.io/stato/ • General-purpose statistics ontology (formal logic-based representation) • Coverage for processes (e.g. statistical tests and their condition of application) and information needed or resulting from statistical methods (e.g. probability distributions, variable, spread and variation metrics) • STATO also benefits from: (i) extensive documentation with the provision of textual and formal definitions; (ii) an associated R code snippets using the dedicated R-command metadata tag, aiming at facilitating teaching and learning while relying of the popular R language; (iii) query examples documentation, highlighting how the ontology can be harnessed for reviewers/ tutors/student alike. Developed in collaboration with Dr Burke, Senior Statistician, Nuffield Department of Population Health, University of Oxford
  21. 21. Reproducible  &  Reusable     Bioscience  Research Well-­‐annotated  &   Structured  Data reasoning analysis exchange integration visualization browsing retrieval Community  Standards Software  Tools
  22. 22. funders
  23. 23. Questions? You can email us... isatools@googlegroups.com View our blog http://isatools.wordpress.com Follow us onTwitter @isatools View our websites View our Git repo & contribute http://github.com/ISA-tools Thanks for your attention!

×