Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
Visual Analytics in Omics - why, what, how?Jan Aerts
Mais conteúdo relacionado
Semelhante a P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
Semelhante a P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe (20)
How to Troubleshoot Apps for the Modern Connected Worker
P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
1. 1
The open source ISA metadata tracking
framework: from data curation and management at
the source, to the linked data universe
BOSC, Long Beach, July 13-14, 2012
Philippe Rocca-Serra (Ph. D)
ISA Team
twitter: @isatools.org
philippe.rocca-serra@oerc.ox.ac.uk
http://www.isa-tools.org
Friday, 13 July 2012
2. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Friday, 13 July 2012
3. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books
(information for humans)
Friday, 13 July 2012
4. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books Spreadsheets and Tables
(information for humans) ( the compromise)
Friday, 13 July 2012
5. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books Spreadsheets and Tables Facts as RDF statements
(information for humans) ( the compromise) (information for machines)
Friday, 13 July 2012
6. 9
Observations
• Experiments are expensive, often publicly funded, still
many fail to see the light.
• Spreadsheets are the most common vehicle for so-called
‘omics’ (functional genomics) experimental metadata
tracking
• technology centric repositories form de facto silos
• conversions are required to allow for deposition to public
databases.
• submitting to common information across a series of
repositories is inefficient
Friday, 13 July 2012
8. 13
Many ontologies, Many Formats, Many
Requirements…
Grr…Where are the
tools!?!
Credits:
h/p://liverpoolsolfed.wordpress.com/resources/image-‐bank/demonstraAon/
Friday, 13 July 2012
10. Why ISA format and Tools?
– Supporting data provenance tracking
– Node/Edge underlying concept
– Tabular as a compromise: a presentation layer inspired by Object
model (FuGE,MAGE-OM)
– A Generic representation, applied to:
• microarray based experiments (MAGE)
• sequencing based experiments (SRA)
• flow cytometry based experiments (FuGE-Flow Cyt)
• mass spectrometry and NMR spectroscopy experiments
Friday, 13 July 2012
11. Why ISA format and Tools?
investigation investigation
high level concept to link H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
related studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel
H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
study
the central unit, containing
information on the subject
under study, its characteristics H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
and any treatments applied. H1
a study has associated assays H. Sapiens H1.sample2 h1-s2.cel
35 Years
assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
test performed either on H. Sapiens
33 Years
material taken from the sub-
ject or on the whole initial
subject, which produce quali-
tative or quantitative meas- ISA metadata specifications:
urements (data)
•workflow and process orientated
•compatible with checklist enforcement
•compatible with external vocabulary resources
assay(s) assay(s) •compatible by design with existing schemas
pointers to data file MAGE-Tab
names/location
Pride-xml
SRA-xml
external files in Currently finalizing conversion to RDF to explore
native or other for-
mats
the growing Linked Data universe, in collaboration
with the W3C HCLSIG, Toxbank Consortium)
data data
Friday, 13 July 2012
12. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
13. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
14. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
15. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Comment[…]
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
16. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Comment[…]
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
19. 22
How do ISA tools access Ontology servers?
Friday, 13 July 2012
20. The ISAcreator...
isacreator
Developed to be a user friendly way to
enter standards-compliant metadata: it
has lots of features...
But these are just some of them...we
also have a data entry wizard and an
import utility...
Friday, 13 July 2012
21. 24
Select and Annotate in ISAcreator
Friday, 13 July 2012
23. Plugins in ISAcreator
In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.
•Plugins can be developed for 3 different purposes:
Search (adds extra search space Custom cell editors Extra general functionality
for ontology tool) (for spreadsheet) (which appears in a plugin
menu)
•2 Examples of ISA plugins:
• Access to local metadata stores: Novartis Plugin to Ontology Widget
• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).
Friday, 13 July 2012
24. Plugins...example 1 Novartis Metastore Search
Search function on the Novartis
Metastore... integrates search results
on the metastore in the Ontology
search tool.
So, with the Novartis plugin in your
Plugin directory, you’ll be able to
search the Novartis metastore
directly within ISAcreator, and it will
handle all the tasks involved with
recording term source, etc.
Friday, 13 July 2012
25. Plugins Example 2 - Metabolite Identification plugin
5
Credits: Kenneth Haug: Metabolights
Friday, 13 July 2012
26. 30
Potential Issues and known hurdles
• The problem of conflicting versions
– especially high when working with big consortia
– distributed, decentralized groups of users
• Lack of version control and history
• Absence of collaborative features
– Looking for new solutions while retaining the features !
• OntoMaton: Bringing Google Doc, NCBO Bioportal and
ISA-TAB together !
Friday, 13 July 2012
30. OntoMaton
• Public release: http://goo.gl/2OKFV
• Can be used in any Google Spreadsheet
document
• Application:
• Annotating data records
• Supporting ontology development (see OBI
Quick Term Templates)
Friday, 13 July 2012
31. 31
ISA2RDF work in progress
• Use case on W3C HCLS scientific discourse list
– deciding on the granularity of representation
– building on previous experience
– Evaluating alternative representations.
• Participitation to the Biohackathon 2011
– http://blogs.openaccesscentral.com/blogs/bmcblog/entry/
biohackathon_2011_number_1
– Discussing best practices
• PURL uri and identifiers.org as identifiers
• Openphacts guidelines (http://www.nanopub.org/guidelines/
OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf)
•
Friday, 13 July 2012
32. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
33. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
34. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
35. 32
ISA2RDF: work in progress
jeliazkova.nina
[toxbank project]
Friday, 13 July 2012
36. 32
ISA2RDF: work in progress
jeliazkova.nina
[toxbank project]
Friday, 13 July 2012
37. ISA2OWL
• OWLAPI
• ISA Parser (in memory BII object store objects)
• Mapping ISA syntax into target Ontological Space
• Decoupling Mapping from Conversion Engine
• avoid to be tied to a semantic framework
Friday, 13 July 2012
40. ISA2OWL: mapping issues
• Stability over time
• Keeping track of resource versions
• Gaps in coverage
• Use of local extensions
• Direct requests/contributions
Friday, 13 July 2012
41. ISA2OWL: development
• include graph metadata (graph provenance to aid
indexing)
• extend semantic validation of ISA archive
• augment annotation by suggesting additions
• facilitate curation work
• create new mappings to other frameworks
(OPML model, SIO,)
Friday, 13 July 2012
42. 33
Publication...
ISA software suite: supporting standards-compliant
experimental annotation and enabling curation at the
community level
Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris;
Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone
BioinformaAcs
2010
26:
2354-‐2356
Friday, 13 July 2012
43. 34
Acknowledgements
Groups and individuals participating in:
MIBBI http://mibbi.org
ISA-‐Tab
format http://isatab.sf.net
OBO
Foundry http://obofoundry.org
OBI: http://obi-ontology.org/page/Main_Page
collaborators at:
ISA Infrastructure Team: Cambridge University
Alejandra Gonzalez-‐Beltran
(Oxford) EuNuGO
Harvard School for Public Health
Eamonn Maguire
(Oxford) FDAs NCTR
Philippe Rocca-‐Serra
(Oxford) Leibniz Plant Institute
NERCs NEBC
SIDR,
INIST
Metabolights,
EMBL-‐EBI
Funders:
EU Carcinogenomics Project
UK
BBSRC
Friday, 13 July 2012
44. 35
Groups and individuals participating in:
Winston Hide: HSPH
Oliver Hoffman: HSPH
Shannan Ho Sui : HSPH
Brad Chapman: HSPH
Christoph Steinbeck: Metabolights
Kenneth Haug: Metabolights
Paula de Matos: Metabolights
Magali Roux: INIST
Florian Mazur: INIST
Alain Zasadzinki: INIST
Marie Christine Jacquemot: INIST
Nina Jeliazkova: ToxBank
And many more who have to forgive us!
Friday, 13 July 2012