Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
1. Metadata challenges of reproducible
research and re-usable data
BioSharing, ISA and STATO examples
Alejandra González-Beltrán, PhD
Oxford e-Research Centre, University of Oxford
alejandra.gonzalezbeltran@oerc.ox.ac.uk @alegonbel
OpenData & Reproducibility workshop: the Good Scientist in the Open Science era
21st April 2015 British Ecological Society, UK
3. Reproducible
&
Reusable
Bioscience
Research
Well-‐annotated
&
Structured
Data
reasoning
analysis
exchange
integration
visualization
browsing
retrieval
Community
Standards Software
Tools
4. Reproducible
&
Reusable
Bioscience
Research
Well-‐annotated
&
Structured
Data
reasoning
analysis
exchange
integration
visualization
browsing
retrieval
Community
Standards Software
Tools
5. A community mobilization to develop standards, e.g.:
! Structural and operational differences
• organization types (open, close to members, society, WG etc.)
• standards development (how to formulate, conduct and maintain)
• adoption, uptake, outreach (link to journals, funders and commercial sector)
• funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
6. Types of reporting standards
Nanotechnology Working Group
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
7. A web-based, curated and searchable registry ensuring that standards
and databases are registered, informative and discoverable; also
monitoring the development and evolution of standards, their use in
databases and the adoption of both in data policies.
Launched Jan 2011
8. Researchers, developers and curators lack support and guidance on how to best navigate and
select content standards, understand their maturity, or find databases that implement them;
Funders, journals and librarians do not have enough information to make informed decisions on
which content standards or database to recommended in policies, or funded or implemented
Goal: assist stakeholders to make informed decisions
9. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
Core functionalities:
• search and filtering, e.g. by
funder
• submissions forms to add
new records
• “claim” functionality of
existing records
• person’s profile (as
maintainer of records)
associated to the ORCID
profile (for credit, as
incentive)
• visualization and views of
content
Search, filter, submit, claim, view and more
15. investigation
assay(s) assay(s)
data data
external files in
native or other for-
mats
pointers to data file
names/location
investigation
high level concept to link
related studies
study
the central unit, containing
information on the subject
under study, its characteristics
and any treatments applied.
a study has associated assays
assay
test performed either on
material taken from the sub-
ject or on the whole initial
subject, which produce quali-
tative or quantitative meas-
urements (data)
• environmental health
• environmental genomics
• metabolomics
• metagenomics
• nanotechnology
• proteomics
• stem cell discovery
• system biology
• transcriptomics
• toxicogenomics
• communities
working to build a
library of cellular
signatures
16. investigation
assay(s) assay(s)
data data
external files in
native or other for-
mats
pointers to data file
names/location
investigation
high level concept to link
related studies
study
the central unit, containing
information on the subject
under study, its characteristics
and any treatments applied.
a study has associated assays
assay
test performed either on
material taken from the sub-
ject or on the whole initial
subject, which produce quali-
tative or quantitative meas-
urements (data)
• environmental health
• environmental genomics
• metabolomics
• metagenomics
• nanotechnology
• proteomics
• stem cell discovery
• system biology
• transcriptomics
• toxicogenomics
• communities
working to build a
library of cellular
signatures
17. The experimental plan
experimental design!
sample characteristic(s)!
experimental variable(s)!
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug candidates from participating companies and
2 reference toxic compounds
InnoMed PredTox Project
18. The experimental plan
experimental design!
sample characteristic(s)!
experimental variable(s)!
technology(s)!
measurement(s)!
protocols(s)!
data file(s)!
…!
24. 24
http://isa-tools.github.io/stato/
• General-purpose statistics ontology (formal logic-based
representation)
• Coverage for processes (e.g. statistical tests and their condition of
application) and information needed or resulting from statistical
methods (e.g. probability distributions, variable, spread and
variation metrics)
• STATO also benefits from: (i) extensive documentation with the
provision of textual and formal definitions; (ii) an associated R
code snippets using the dedicated R-command metadata tag,
aiming at facilitating teaching and learning while relying of the
popular R language; (iii) query examples documentation,
highlighting how the ontology can be harnessed for reviewers/
tutors/student alike.
Developed in collaboration with Dr Burke, Senior Statistician,
Nuffield Department of Population Health, University of Oxford
25.
26.
27.
28. Reproducible
&
Reusable
Bioscience
Research
Well-‐annotated
&
Structured
Data
reasoning
analysis
exchange
integration
visualization
browsing
retrieval
Community
Standards Software
Tools
30. Questions?
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
@isatools
View our websites
View our Git repo & contribute
http://github.com/ISA-tools
Thanks for your attention!