SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
The	
  ISA	
  Infrastructure	
  for	
  the	
  biosciences	
  
          from	
  data	
  curaDon	
  at	
  source	
  to	
  the	
  linked	
  data	
  cloud	
  

                               Alejandra	
  Gonzalez-­‐Beltran	
  
                              University	
  of	
  Oxford	
  e-­‐Research	
  Centre,	
  UK	
  
                               Alejandra.GonzalezBeltran@oerc.ox.ac.uk	
  




Conference on Semantics in Healthcare and Life Sciences (CSHALS)	

Boston, USA Feb 27- Mar 1 2013
Outline	
  
   •  The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  infrastructure	
  :	
  a	
  metadata	
  tracking	
  
           framework	
  in	
  the	
  biosciences:	
  the	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  format,	
  	
  
           a	
  set	
  of	
  open	
  source	
  soMware	
  tools	
  and	
  the	
  user	
  
           community	
  

   •  The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  syntax	
  and	
  its	
  implicit	
  semanDcs	
  
   •  The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  component	
  of	
  the	
  infrastructure	
  
       •  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  for	
  mapping	
  the	
  syntax	
  to	
  ontologies	
  
       •  A	
  couple	
  of	
  mappings,	
  architecture,	
  conversion	
  
Contextual	
  informaDon	
  (metadata):	
  
  •  Sample	
  characterisDcs	
  
  •  Technology	
  and	
  measurement	
  types	
  
  •  Instrument	
  parameters	
  
  •  …	
  
Need	
  for	
  a	
  generic	
  representaDon,	
  applied	
  to:	
  
       	
  •microarray	
  based	
  experiments	
  (MAGE)	
  
       	
  •sequencing	
  based	
  experiments	
  (SRA)	
  
       	
  •flow	
  cytometry	
  based	
  experiments	
  (FuGE-­‐Flow	
  Cyt)	
  
       	
  •mass	
  spectrometry	
  and	
  NMR	
  spectroscopy	
  
experiments	
  (Metabolights	
  and	
  PRIDE)	
  
ISA	
  soMware	
  suite:	
  supporDng	
  
                                                                              standards-­‐compliant	
  experimental	
  


	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  infrastructure	
  
                                                                             annotaDon	
  and	
  enabling	
  curaDon	
  at	
  
                                                                                        the	
  community	
  level	
  
                                                                                   Rocca-­‐Serra	
  et	
  al,	
  	
  2010	
  
                                                                                           BioinformaDcs	
  




    •  Assist	
  in	
  the	
  annotaDon	
  and	
  management	
  of	
  
           experimental	
  metadata	
  at	
  source,	
  supporDng	
  data	
  
           provenance	
  tracking	
  
    •  Deal	
  with	
  high-­‐throughput	
  studies	
  using	
  one	
  or	
  a	
  
           combinaDon	
  of	
  omics	
  and	
  other	
  technologies	
  
    •  Empower	
  users	
  to	
  uptake	
  community-­‐defined	
  
           checklists	
  and	
  ontologies	
  
    •  Facilitate	
  data	
  sharing,	
  re-­‐use,	
  comparison	
  and	
  
           reproducibility	
  of	
  experiments,	
  submission	
  to	
  
           internaDonal	
  public	
  repositories	
  
Towards	
  interoperable	
  bioscience	
  data	
  
                                                                                                                      Sansone	
  et	
  al,	
  2012	
  
                                                                                                                       Nature	
  GeneDcs	
  




                                                               A	
  growing	
  ecosystem	
  	
  
             of	
  over	
  30	
  public	
  and	
  internal	
  resources	
  using	
  the	
  ISA	
  metadata	
  tracking	
  framework	
  	
  
to	
  facilitate	
  standards-­‐compliant	
  collecDon,	
  curaDon,	
  management	
  and	
  reuse	
  of	
  invesDgaDons	
  in	
  an	
  
                                            increasingly	
  diverse	
  set	
  of	
  life	
  science	
  domains.	
  
 syntax	
  	
  
(and	
  its	
  implicit	
  semanDcs)	
  
HybridizaDon	
                                                                                               Derived	
  Array	
  Data	
  File	
  
     Sample	
  Name	
   Material	
  Type	
                             Assay	
  Design	
  REF	
   Array	
  Data	
  File	
     Protocol	
  REF	
  
                                                   Assay	
  Name	
                                                                                              	
  
                        	
  

     sample1	
           genomic	
  DNA	
          assay1	
            A-AFFY-107"                 assay1.cel	
               data	
  normalizaDon	
            assay1.txt	
  



     sample2	
           genomic	
  DNA	
          assay2	
            A-AFFY-107"                 assay2.cel	
               data	
  normalizaDon	
            assay2.txt	
  




     sample3	
           genomic	
  DNA	
          assay3	
            A-AFFY-107"                 assay3.cel	
               data	
  normalizaDon	
            assay3.txt	
  




Material transformations...	

                               Material Node	

                                                                                  Data File Node	

                                                                                                                                            "
                                  "                                                                                                             DATA!
                Characteristics[…]	

Material!                                                                                                         Derived Data File	

                Factor Value[…]
                (independent                                                             Protocol
                variables)	

                                                            Process	

                Material Type	

                Comment[…]	


                                                                                     Parameter Value […]	

                                               "                                                                                                "
                                                Material!                                                                                           DATA!   Raw Data File	

                                                                                    Performer (operator effect)	

                                                                                      Date (day effect)
11	
  


    Tagging:	
  from	
  free	
  text	
  to	
  ontology-­‐based	
  
             • single	
  intervenDon	
  representaDon,	
  free	
  text	
  annotaDon	
  
                                                                                            Factor	
  
                                                  CharacterisDcs[organism]	
                                      Factor	
                          Factor	
  
                     Source	
  Name	
                                                       Value[perturbaDon	
  
                                                  	
                                                              Value[dose]	
                     Value[duraDon]	
  
                                                                                            agent]	
  

                     individual1	
                human	
                                   aspirin	
                   high	
  dose	
              12	
  weeks	
  




             • single	
  intervenDon,	
  ontology-­‐based	
  annotaDon	
  

                                                                                                                 Factor	
  
                                       CharacterisDcs[organism
                                                               Term	
  Source	
   Term	
  Accession	
            Value[chemical	
            Term	
  Source	
   Term	
  Accession	
  
Source	
  Name	
                       obi:0100026)])	
  
                                                               REF	
              Number	
                       compound	
                  REF	
              Number	
  
                                       	
  
                                                                                                                 CHEBI_37577)]	
  

individual1	
                          Homo	
  sapiens	
                 NCBITax	
         9606	
                aspirin	
                   CHEBI	
                  1231354	
  




 Factor	
                    Term	
  Source	
                Term	
  Accession	
       Factor	
  Value[Dme	
                               Term	
  Source	
   Term	
  Accession	
  
                                                                                                                     Unit	
  
 Value[dose(OBI_0000984)	
   REF	
                           Number	
                  (PATO_0000165)]	
                                   REF	
              Number	
  


 low	
  dose	
                    LNC	
                      LP30872-­‐3	
             12	
                          week	
                UO	
                  0000034 	
  
ToxBank	
  effort	
  
      	
  developed	
  by	
  Nina	
  Jeliazkova	
  	
  




                                                                        Health	
  Care	
  &	
  Life	
  Sciences	
  	
  
Kohonen	
  et	
  al.	
  The	
  ToxBank	
  Data	
  Warehouse:	
  a	
           Interest	
  Group	
  	
  
                  research	
  cluster	
  of	
  7	
  	
  
    EU	
  FP7	
  Health	
  systems	
  toxicology	
  and	
  
             toxicogenomics	
  projects.	
  
                                 	
  
•  Make	
  the	
  semanDcs	
  of	
  ISAtab	
  explicit,	
  including	
  
    materials	
  &	
  data	
  enDDes	
  &	
  processes	
  &	
  their	
  
    relaDonships	
  
•  Provide	
  incenDves	
  for	
  provision	
  of	
  ontology-­‐based	
  
    annotaDons	
  in	
  ISA-­‐TAB	
  datasets;	
  exploit	
  those	
  
    annotaDons	
  	
  
•  Augment	
  ISA	
  syntax	
  with	
  new	
  elements	
  (e.g.	
  
    groups),	
  facilitaDng	
  the	
  understanding	
  &	
  querying	
  of	
  
    experimental	
  design	
  
•  Facilitate	
  data	
  integraDon	
  &	
  knowledge	
  discovery/
    reasoning	
  
architecture	
  




ISA-TAB
 parser	

           graph	

               isa2owl mapping	

                    analysis	

                  parser	



             Configuration	

                 file
•  Ontology	
  search	
  and	
  automated	
  tagging	
  	
  (relying	
  on	
  	
  
                                              NCBO	
  Bioportal	
  services)	
  on	
  Google	
  Spreadsheets	
  
                                              •  CollaboraDve	
  annotaDon;	
  support	
  for	
  distributed	
  users	
  
                                              •  Version	
  control	
  &	
  history	
  




OntoMaton:	
  a	
  Bioportal	
  powered	
  
  Ontology	
  widget	
  for	
  Google	
  
        Spreadsheets	
  
     Maguire	
  et	
  al,	
  	
  2013	
  
        BioinformaDcs	
  
vocabularies	
  


             Chemical	
                         Biomolecular	
  	
                                  InformaDon	
  
              domain	
                             domain	
                                           domain	
  
                   	
                                            	
                                                	
  
Experimental	
  
  domain	
  




                                                                                                             Factor	
  
                                               CharacterisDcs[organi                                                                                Term	
  
                                                                     Term	
            Term	
  Accession	
   Value[chemical	
     Term	
  
                          Source	
  Name	
     smobi:0100026)])	
                                                                                   Accession	
  
                                                                     Source	
  REF	
   Number	
              compound	
           Source	
  REF	
  
                   	
  




                                               	
                                                                                                   Number	
  
                                                                                                             CHEBI_37577)]	
  

                          individual1	
        Homo	
  sapiens	
         NCBITax	
      9606	
              aspirin	
             CHEBI	
          1231354	
  
Open	
  Biological	
  and	
  
              Biomedical	
  Ontologies	
  
                 (OBO)	
  Foundry	
                                           BFO	
  



          ChEBI	
                                    GO	
                                             IAO	
  




                                                                                                    Factor	
  
                                      CharacterisDcs[organi                                                                                Term	
  
OBI	
  




                                                            Term	
            Term	
  Accession	
   Value[chemical	
     Term	
  
               Source	
  Name	
       smobi:0100026)])	
                                                                                   Accession	
  
                                                            Source	
  REF	
   Number	
              compound	
           Source	
  REF	
  
                                      	
                                                                                                   Number	
  
                                                                                                    CHEBI_37577)]	
  

               individual1	
          Homo	
  sapiens	
         NCBITax	
      9606	
              aspirin	
             CHEBI	
          1231354	
  
ISA-­‐OBI	
  mapping	
  
ISA-­‐SIO	
  mapping	
  
faahKO	
  dataset	
  
        	
  
   Available	
  in	
  
  Bioconductor	
  	
  
  (with	
  ISA-­‐TAB	
  
   metadata)	
  
Global	
  metabolite	
  
     profiling	
  

Data	
  subset:	
  LC/
MS	
  peaks	
  from	
  the	
  
spinal	
  cords	
  of	
  6	
  
wild-­‐type	
  and	
  6	
  
FAAH	
  (fapy	
  acid	
  
amyde	
  hydrolase)	
  
knockout	
  mice	
  
•  support	
  different	
  conversion	
  modes	
  (different	
  levels	
  
   of	
  granularity)	
  
•  querying	
  for	
  ISA-­‐TAB	
  datasets,	
  across	
  mulDple	
  
   experiment	
  types	
  
•  reasoning	
  exploiDng	
  ontology	
  annotaDons	
  
  •  	
  semanDc	
  validaDon	
  of	
  ISA-­‐TAB	
  datasets	
  
•  augmented	
  annotaDon	
  over	
  naDve	
  ISA	
  syntax	
  
  •  idenDficaDon	
  gaps	
  in	
  ontological	
  representaDons	
  	
  
  •  feedback	
  of	
  findings	
  to	
  community	
  ontologies	
  
Increasing	
  level	
  of	
  structure	
  	
  
                                   for	
  experimental	
  metadata	
  




Notes	
  in	
  Lab	
  books	
                Spreadsheets	
  &	
  Tables	
  
                                                                               Facts	
  as	
  RDF	
  statements	
  
              	

                              (ISAtab	
  metadata)	
  
@isatools @biosharing	

isa-tools.org      isacommons.org    biosharing.org

Mais conteúdo relacionado

Mais procurados

BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 

Mais procurados (20)

BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
ROHub
ROHubROHub
ROHub
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 

Semelhante a ISA INFRASTRUCTURE

Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Stuart Wrigley
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
DIR workshop ontology stream data access
DIR workshop ontology stream data accessDIR workshop ontology stream data access
DIR workshop ontology stream data accessJean-Paul Calbimonte
 
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...Mark Matienzo
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
EdgarDB overview
EdgarDB overviewEdgarDB overview
EdgarDB overviewMark Khoury
 
Using and Developing with Open Source Digital Forensics Software in Digital A...
Using and Developing with Open Source Digital Forensics Software in Digital A...Using and Developing with Open Source Digital Forensics Software in Digital A...
Using and Developing with Open Source Digital Forensics Software in Digital A...Mark Matienzo
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Syed Ahmad Chan Bukhari, PhD
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Ahmad C. Bukhari
 
100615 htap network_brussels
100615 htap network_brussels100615 htap network_brussels
100615 htap network_brusselsRudolf Husar
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...JISC KeepIt project
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 

Semelhante a ISA INFRASTRUCTURE (20)

Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
iRODS
iRODSiRODS
iRODS
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
DIR workshop ontology stream data access
DIR workshop ontology stream data accessDIR workshop ontology stream data access
DIR workshop ontology stream data access
 
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
EdgarDB overview
EdgarDB overviewEdgarDB overview
EdgarDB overview
 
Using and Developing with Open Source Digital Forensics Software in Digital A...
Using and Developing with Open Source Digital Forensics Software in Digital A...Using and Developing with Open Source Digital Forensics Software in Digital A...
Using and Developing with Open Source Digital Forensics Software in Digital A...
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
March 2013 Introduction
March 2013 IntroductionMarch 2013 Introduction
March 2013 Introduction
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
 
100615 htap network_brussels
100615 htap network_brussels100615 htap network_brussels
100615 htap network_brussels
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 

Mais de Alejandra Gonzalez-Beltran

Mais de Alejandra Gonzalez-Beltran (13)

The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
 
CMSO Minimal reporting requirements
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMed
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
ISA commons - overview and latest developments
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
 
Metadata for Interoperable Bioscience
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionBrazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare session
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
BCU 2013
BCU 2013BCU 2013
BCU 2013
 
SELENfest 2012
SELENfest 2012SELENfest 2012
SELENfest 2012
 

ISA INFRASTRUCTURE

  • 1. The  ISA  Infrastructure  for  the  biosciences   from  data  curaDon  at  source  to  the  linked  data  cloud   Alejandra  Gonzalez-­‐Beltran   University  of  Oxford  e-­‐Research  Centre,  UK   Alejandra.GonzalezBeltran@oerc.ox.ac.uk   Conference on Semantics in Healthcare and Life Sciences (CSHALS) Boston, USA Feb 27- Mar 1 2013
  • 2. Outline   •  The                                    infrastructure  :  a  metadata  tracking   framework  in  the  biosciences:  the                                                  format,     a  set  of  open  source  soMware  tools  and  the  user   community   •  The                                                syntax  and  its  implicit  semanDcs   •  The                                                  component  of  the  infrastructure   •                                             for  mapping  the  syntax  to  ontologies   •  A  couple  of  mappings,  architecture,  conversion  
  • 3.
  • 4. Contextual  informaDon  (metadata):   •  Sample  characterisDcs   •  Technology  and  measurement  types   •  Instrument  parameters   •  …  
  • 5. Need  for  a  generic  representaDon,  applied  to:    •microarray  based  experiments  (MAGE)    •sequencing  based  experiments  (SRA)    •flow  cytometry  based  experiments  (FuGE-­‐Flow  Cyt)    •mass  spectrometry  and  NMR  spectroscopy   experiments  (Metabolights  and  PRIDE)  
  • 6. ISA  soMware  suite:  supporDng   standards-­‐compliant  experimental                              infrastructure   annotaDon  and  enabling  curaDon  at   the  community  level   Rocca-­‐Serra  et  al,    2010   BioinformaDcs   •  Assist  in  the  annotaDon  and  management  of   experimental  metadata  at  source,  supporDng  data   provenance  tracking   •  Deal  with  high-­‐throughput  studies  using  one  or  a   combinaDon  of  omics  and  other  technologies   •  Empower  users  to  uptake  community-­‐defined   checklists  and  ontologies   •  Facilitate  data  sharing,  re-­‐use,  comparison  and   reproducibility  of  experiments,  submission  to   internaDonal  public  repositories  
  • 7. Towards  interoperable  bioscience  data   Sansone  et  al,  2012   Nature  GeneDcs   A  growing  ecosystem     of  over  30  public  and  internal  resources  using  the  ISA  metadata  tracking  framework     to  facilitate  standards-­‐compliant  collecDon,  curaDon,  management  and  reuse  of  invesDgaDons  in  an   increasingly  diverse  set  of  life  science  domains.  
  • 8.  syntax     (and  its  implicit  semanDcs)  
  • 9.
  • 10. HybridizaDon   Derived  Array  Data  File   Sample  Name   Material  Type   Assay  Design  REF   Array  Data  File   Protocol  REF   Assay  Name       sample1   genomic  DNA   assay1   A-AFFY-107" assay1.cel   data  normalizaDon   assay1.txt   sample2   genomic  DNA   assay2   A-AFFY-107" assay2.cel   data  normalizaDon   assay2.txt   sample3   genomic  DNA   assay3   A-AFFY-107" assay3.cel   data  normalizaDon   assay3.txt   Material transformations... Material Node Data File Node " " DATA! Characteristics[…] Material! Derived Data File Factor Value[…] (independent Protocol variables) Process Material Type Comment[…] Parameter Value […] " " Material! DATA! Raw Data File Performer (operator effect) Date (day effect)
  • 11. 11   Tagging:  from  free  text  to  ontology-­‐based   • single  intervenDon  representaDon,  free  text  annotaDon   Factor   CharacterisDcs[organism]   Factor   Factor   Source  Name   Value[perturbaDon     Value[dose]   Value[duraDon]   agent]   individual1   human   aspirin   high  dose   12  weeks   • single  intervenDon,  ontology-­‐based  annotaDon   Factor   CharacterisDcs[organism Term  Source   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   obi:0100026)])   REF   Number   compound   REF   Number     CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354   Factor   Term  Source   Term  Accession   Factor  Value[Dme   Term  Source   Term  Accession   Unit   Value[dose(OBI_0000984)   REF   Number   (PATO_0000165)]   REF   Number   low  dose   LNC   LP30872-­‐3   12   week   UO   0000034  
  • 12. ToxBank  effort    developed  by  Nina  Jeliazkova     Health  Care  &  Life  Sciences     Kohonen  et  al.  The  ToxBank  Data  Warehouse:  a   Interest  Group     research  cluster  of  7     EU  FP7  Health  systems  toxicology  and   toxicogenomics  projects.    
  • 13. •  Make  the  semanDcs  of  ISAtab  explicit,  including   materials  &  data  enDDes  &  processes  &  their   relaDonships   •  Provide  incenDves  for  provision  of  ontology-­‐based   annotaDons  in  ISA-­‐TAB  datasets;  exploit  those   annotaDons     •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilitaDng  the  understanding  &  querying  of   experimental  design   •  Facilitate  data  integraDon  &  knowledge  discovery/ reasoning  
  • 14. architecture   ISA-TAB parser graph isa2owl mapping analysis parser Configuration file
  • 15. •  Ontology  search  and  automated  tagging    (relying  on     NCBO  Bioportal  services)  on  Google  Spreadsheets   •  CollaboraDve  annotaDon;  support  for  distributed  users   •  Version  control  &  history   OntoMaton:  a  Bioportal  powered   Ontology  widget  for  Google   Spreadsheets   Maguire  et  al,    2013   BioinformaDcs  
  • 16.
  • 17. vocabularies   Chemical   Biomolecular     InformaDon   domain   domain   domain         Experimental   domain   Factor   CharacterisDcs[organi Term   Term   Term  Accession   Value[chemical   Term   Source  Name   smobi:0100026)])   Accession   Source  REF   Number   compound   Source  REF       Number   CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 18. Open  Biological  and   Biomedical  Ontologies   (OBO)  Foundry   BFO   ChEBI   GO   IAO   Factor   CharacterisDcs[organi Term   OBI   Term   Term  Accession   Value[chemical   Term   Source  Name   smobi:0100026)])   Accession   Source  REF   Number   compound   Source  REF     Number   CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 21. faahKO  dataset     Available  in   Bioconductor     (with  ISA-­‐TAB   metadata)   Global  metabolite   profiling   Data  subset:  LC/ MS  peaks  from  the   spinal  cords  of  6   wild-­‐type  and  6   FAAH  (fapy  acid   amyde  hydrolase)   knockout  mice  
  • 22.
  • 23. •  support  different  conversion  modes  (different  levels   of  granularity)   •  querying  for  ISA-­‐TAB  datasets,  across  mulDple   experiment  types   •  reasoning  exploiDng  ontology  annotaDons   •   semanDc  validaDon  of  ISA-­‐TAB  datasets   •  augmented  annotaDon  over  naDve  ISA  syntax   •  idenDficaDon  gaps  in  ontological  representaDons     •  feedback  of  findings  to  community  ontologies  
  • 24. Increasing  level  of  structure     for  experimental  metadata   Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements   (ISAtab  metadata)  
  • 25. @isatools @biosharing isa-tools.org isacommons.org biosharing.org