SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
The	
  Inves)ga)on/Study/Assay	
  (ISA)	
  
metadata	
  framework	
  for	
  reproducible	
  
  and	
  reusable	
  bioscience	
  research	
  
                     Alejandra	
  González-­‐Beltrán,	
  PhD	
  
                          on	
  behalf	
  of	
  the	
  ISATeam	
  
                                            	
  
                                            	
  
        Oxford	
  e-­‐Research	
  Centre,	
  University	
  of	
  Oxford	
  
                                            	
  
      Faculty	
  of	
  Technology,	
  Environment	
  and	
  Engineering	
  
                         Birmingham	
  City	
  University	
  
                                 12th	
  March	
  2013	
  
                                            	
  
Ioannidis	
   et	
   al.,	
   Repeatability	
   of	
   published	
   microarray	
  
gene	
  expression	
  analyses.	
  Nature	
  Gene*cs	
  41(2),	
  149-­‐55	
  
(2009)	
  doi:10.1038/ng.295	
  	
  
Ioannidis	
   et	
   al.,	
   Repeatability	
   of	
   published	
   microarray	
  
gene	
  expression	
  analyses.	
  Nature	
  Gene*cs	
  41(2),	
  149-­‐55	
  
(2009)	
  doi:10.1038/ng.295	
  	
  
h[p://www.nature.com/news/2011/110111/full/469139a.html	
  
h[p://www.nature.com/news/2011/110111/full/469139a.html	
  




         h[p://www.economist.com/node/21528593	
  
h[p://www.nature.com/news/2011/110111/full/469139a.html	
  




        h[p://www.economist.com/node/21528593	
        h[p://www.ny)mes.com/2011/07/08/health/research/08genes.html	
  
Contextual	
  informa)on	
  (metadata):	
  
  •  Sample	
  characteris)cs	
  
  •  Technology	
  and	
  measurement	
  types	
  
  •  Instrument	
  parameters	
  
  •  …	
  
Need	
  for	
  a	
  generic	
  representa)on,	
  applied	
  to:	
  
   	
  •microarray	
  based	
  experiments	
  (MAGE)	
  
   	
  •sequencing	
  based	
  experiments	
  (SRA)	
  
   	
  •flow	
  cytometry	
  based	
  experiments	
  (FuGE-­‐Flow	
  Cyt)	
  
   	
  •mass	
  spectrometry	
  and	
  NMR	
  spectroscopy	
  
experiments	
  (Metabolights	
  and	
  PRIDE)	
  
Roadmap	
  




              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  
Roadmap	
            reasoning	
   visualiza)on	
  
              analysis	
   browsing	
   integra)on	
  
                  exchange	
   retrieval	
  



                      Well-­‐annotated	
  &	
  
                      Structured	
  Data	
  



              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  
Roadmap	
            reasoning	
   visualiza)on	
  
              analysis	
   browsing	
   integra)on	
  
                  exchange	
   retrieval	
  



                      Well-­‐annotated	
  &	
  
                      Structured	
  Data	
  



              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  


                        User	
  community	
  
Roadmap	
                          reasoning	
   visualiza)on	
  
                             analysis	
   browsing	
   integra)on	
  
                                 exchange	
   retrieval	
  

Community	
  Standards	
                                                Sodware	
  Tools	
  
                                     Well-­‐annotated	
  &	
  
                                     Structured	
  Data	
  



                             Reproducible	
  &	
  Reusable	
  	
  
                               Bioscience	
  Research	
  


                                       User	
  community	
  
Roadmap	
            reasoning	
   visualiza)on	
  
              analysis	
   browsing	
   integra)on	
  
                  exchange	
   retrieval	
  




              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  
Bioscience	
  is	
  mul)-­‐domain…	
  



                                                                                                            health	
  




                             env	
                                                                                            agro	
  


                                                       tox/pharma	
  


§ 	
  	
  	
  Interdisciplinary	
  and	
  integra:ve	
  in	
  character	
  	
  
       •  need	
  to	
  deal	
  with	
  new	
  and	
  exis:ng	
  datasets	
  
       •  deal	
  with	
  a	
  variety	
  of	
  data	
  types	
  
                                                                                   Source	
  of	
  the	
  figure:	
  EBI	
  website	
  
Mul)ple	
  communi)es,	
  mul)ple	
  norms	
  and	
  standards,	
  e.g.:	
  




                                                      use	
  the	
  same	
  term	
  to	
  
             allow	
  data	
  to	
  flow	
  from	
                                                 report	
  the	
  same	
  core,	
  	
  
                                                      refer	
  to	
  the	
  same	
  ‘thing’	
  
             one	
  system	
  to	
  another	
                                                     essen)al	
  informa)on	
  	
  


   Challenges: lack of interaction and coordination, duplication of effort,
      fragmentation and uneven coverage…hinders interoperability
Growing	
  number	
  of	
  bioscience	
  repor)ng	
  standards	
  
                                                                        303	
  +	
  	
  	
  




                                                                                                                            150	
  +	
  	
  	
  
                          130	
  +	
  	
  	
  




                                                                                                                                                   Source:	
  MIBBI,	
  	
  
                                                                                               Source:	
  BioPortal	
  
                                                 Es:mated	
  




                                                                                                                                                                EQUATOR	
  
                                                                                                                                                                                        Databases,	
  	
  
                                                                                                                                                                                        annota)on,	
  
                                                                                                                                                                                         cura)on	
  	
  
                                                                                                                                                                                           tools	
  
                       MAGE-Tab!                                  AAO!                                                    miame!
                     GCDML!                                                                                                    MIAPA!
                                                                     CHEBI!                                                                                                    GIATE!
                       SRAxml!                                    OBI!                                                    MIRIAM!
                                                                       VO!
             SOFT!                                                                                                                       MIQAS!
                   FASTA!                                       PATO!                                                              MIX!
      CML!                                                                      ENVO!                                                           REMARK!
               DICOM!                                                                                                                 MIGEN!
     GELML!                                                      MOD!
                 SBRML!                                                                                                            MIAPE!                                        MIQE!
                                                                       TEDDY!
 MITAB!     MzML!                                               XAO!                                                                                   CIMR! CONSORT!
                                                                                         BTO!
ISA-Tab! SEDML…!                       DO	
   PRO!                                    IDO…!                                                          MIASE! MISFISHIE….!
But…	
  	
  
   what	
  do	
  we	
  know	
  about	
  them	
  and	
  how	
  they	
  are	
  related	
  




                          MAGE-Tab!      AAO!            miame!
                        GCDML!                                MIAPA!
                                           CHEBI!                       GIATE!
                          SRAxml!       OBI!             MIRIAM!
                                             VO!
                SOFT!                                             MIQAS!
                      FASTA!          PATO!                 MIX!
         CML!                                   ENVO!                    REMARK!
                  DICOM!                                       MIGEN!
        GELML!                         MOD!
                    SBRML!                                   MIAPE!        MIQE!
                                            TEDDY!
  MITAB!    MzML!                    XAO!                          CIMR! CONSORT!
                                                 BTO!
ISA-Tab! SEDML…!                 DO	
   PRO!     IDO…!           MIASE! MISFISHIE….!
But…	
  	
  
   what	
  do	
  we	
  know	
  about	
  them	
  and	
  how	
  they	
  are	
  related	
  
                                                                 I	
  use	
  high	
  throughput	
  
          Which	
  tools	
  and	
  
                                                               sequencing	
  technologies,	
  
            databases	
  
                                                               which	
  ones	
  are	
  relevant	
  to	
  
         implement	
  which	
  
                                                                                me?	
  
            standards?	
  

                                                                                  How	
  can	
  I	
  get	
  
     What	
  are	
  the	
                                                      involved	
  to	
  propose	
  
  criteria	
  to	
  evaluate	
                                                    extensions	
  or	
  
    their	
  status	
  and	
                                                      modifica)ons?	
  
         value?	
  


             Which	
  ones	
  are	
        Which	
  formats	
            I	
  work	
  on	
  plants,	
  are	
  
            mature	
  enough	
  for	
     support	
  specific	
           these	
  standards	
  just	
  
              me	
  to	
  use	
  or	
         minimum	
                        for	
  biomedical	
  
              recommend?	
                  informa)on	
                        applica)ons?	
  
                                             guidelines?	
  
A	
  coherent,	
  curated	
  and	
  
   searchable	
  catalogue	
  of	
  
   data	
  sharing	
  resources	
  
                  	
  
•  Bioscience	
  standards	
  and	
  
   associated	
  data-­‐sharing	
  
   policies,	
  publica:ons,	
  tools	
  
   and	
  databases	
  

•  Assessment	
  criteria	
  for	
  
   usability	
  and	
  popularity	
  of	
  
   standards	
  

•  Rela:onships	
  among	
  
   standards	
  

•  Encouragement	
  for	
  
   communica:on	
  &	
  
   interac:on	
  among	
  groups	
  

•  Promo)ng	
  interoperability	
  
   &	
  informed	
  decisions	
  about	
  
   standards	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  infrastructure	
  
ISA	
  sodware	
  suite:	
  suppor)ng	
  
                                                                             standards-­‐compliant	
  experimental	
  
                                                                             annota)on	
  and	
  enabling	
  cura)on	
  at	
  


	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  infrastructure	
     the	
  community	
  level	
  
                                                                             Rocca-­‐Serra	
  et	
  al,	
  	
  2010	
  
                                                                             Bioinforma)cs	
  




    •  Assist	
  in	
  the	
  annota)on	
  and	
  management	
  of	
  
       experimental	
  metadata	
  at	
  source,	
  suppor)ng	
  data	
  
       provenance	
  tracking	
  
    •  Deal	
  with	
  high-­‐throughput	
  studies	
  using	
  one	
  or	
  a	
  
       combina)on	
  of	
  omics	
  and	
  other	
  technologies	
  
    •  Empower	
  users	
  to	
  uptake	
  community-­‐defined	
  checklists	
  
       and	
  ontologies	
  
    •  Facilitate	
  data	
  sharing,	
  re-­‐use,	
  comparison	
  and	
  
       reproducibility	
  of	
  experiments,	
  submission	
  to	
  
       interna)onal	
  public	
  repositories	
  
faahKO	
  dataset	
  
•  Available	
  in	
  Bioconductor	
  
•  Subset	
  of	
  the	
  original	
  data	
  on	
  global	
  metabolite	
  profiling	
  

                                                                           Saghatlian	
  et	
  al.	
  
                                                                           Biochemistry.	
  2004	
  




•  LC/MS	
  peaks	
  from	
  the	
  spinal	
  cords	
  of	
  6	
  wild-­‐type	
  and	
  6	
  FAAH	
  
   (fa[y	
  acid	
  amyde	
  hydrolase)	
  knockout	
  mice	
  
-­‐	
  	
  Define	
  key	
  en))es	
  (e.g.	
  factors,	
  	
  
protocols,	
  parameters)	
  
-­‐	
  Grouping	
  of	
  studies	
  
-­‐	
  Relate	
  studies	
  and	
  assays	
                      faahKO	
  inves)ga)on	
  
-­‐  Subjects	
  studied:	
  source(s),	
  sampling	
  
                                                 methodology,	
  characteris)cs	
  
faahKO	
  study	
                                -­‐  treatments/manipula)ons	
  performed	
  	
  
                                                 to	
  prepare	
  the	
  specimens	
  
                                                 	
  




   NEWT	
  UniProt	
  Taxonomy	
  Database	
  
                                                        Mouse	
  Genome	
  Informa)cs	
  
-­‐  Subjects	
  studied:	
  source(s),	
  sampling	
  
                                methodology,	
  characteris)cs	
  
faahKO	
  study	
               -­‐  treatments/manipula)ons	
  performed	
  	
  
                                to	
  prepare	
  the	
  specimens	
  
                                	
  




                      Mouse	
  Adult	
  Gross	
  Anatomy	
  
-­‐  measurement	
  type,	
  e.g.	
  metabolite	
  profiling	
  
-­‐  technology,	
  e.g.	
  mass	
  spectrometry	
                faahKO	
  assay	
  
Create template(s) to fit the type of
experiments to be described	

	
  

Create	
  templates	
  detailing	
  the	
  steps	
  to	
  be	
  
reported	
  for	
  different	
  inves)ga)ons,	
  complying	
  
to	
  community	
  standards,	
  e.g.	
  configuring	
  the	
  
value(s)	
  allowed	
  for	
  each	
  field	
  to	
  be	
  	
  
•  text	
  (with/without	
  regular	
  expression	
  tes)ng),	
  
•  ontology	
  terms,	
  
•  numbers	
  etc.	
  
	
  

	
  
	
  
	
  
Describe, curate your experiment using a
desktop-based tool	

	
  




Report and edit the description using this tool,
(also customized using the templates) with a
spreadsheet like look and feel, packed with
functionalities such as 	

•  ontology search (access via               ) 	

•  term-tagging features	

•  import from spreadsheets etc…	
  
•  Ontology	
  search	
  and	
  automated	
  tagging	
  	
  (relying	
  on	
  	
  
                                              NCBO	
  Bioportal	
  services)	
  on	
  Google	
  Spreadsheets	
  
                                              •  Collabora)ve	
  annota)on;	
  support	
  for	
  distributed	
  users	
  
                                              •  Version	
  control	
  &	
  history	
  




OntoMaton:	
  a	
  Bioportal	
  powered	
  
Ontology	
  widget	
  for	
  Google	
  
Spreadsheets	
  
Maguire	
  et	
  al,	
  	
  2013	
  
Bioinforma)cs	
  
•  R	
  package	
  available	
  in	
  BioConductor	
  2.11	
  	
  
                               h[p://bioconductor.org/packages/release/bioc/html/Risa.html	
  

•  ISAtab	
  class	
  
•  Read	
  ISAtab	
  files	
  into	
  ISAtab	
  objects	
  and	
  write	
  ISAtab	
  
   files	
  back	
  to	
  disk	
  
•  Increment	
  metadata	
  with	
  defini)on	
  factors/
   treatments/groups	
  
•  Build	
  xcmsSet	
  (xcms	
  package)	
  objects	
  from	
  mass	
  
   spectrometry	
  assays	
  	
  	
  
•  Augment	
  the	
  ISAtab	
  dataset	
  ader	
  analysis	
  
•  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  source	
  &	
  issues	
  tracking	
  
                                                                               	
  
                                                   h[ps://github.com/ISA-­‐tools/Risa	
  
                                                                  	
  
                                                   	
             	
  
                                                                               	
  
•  faahKO	
  package	
  v.	
  2.12	
  contains	
  ISAtab	
  files	
  
   describing	
  the	
  experiment	
  
    	
  	
  	
  	
  faahkoISA	
  =	
  readISAta(find.package("faahKO"))	
  
    	
  	
  	
  	
  assay.filename	
  <-­‐	
  faahkoISA["assay.filenames"][[1]]	
  
    	
  	
  	
  	
  xset	
  =	
  processAssayXcmsSet(faahkoISA,	
  assay.filename)	
  
    	
  	
  	
  	
  …	
  
    	
  	
  	
  	
  updateAssayMetadata(faahkoISA,	
  assay.filename,"Derived	
  Spectral	
  
    Data	
  File","faahkoDSDF.txt"	
  )	
  
•  MTBLS2	
  processing	
  and	
  analysis	
  using	
  Risa,	
  xcms	
  and	
  
   CAMERA	
  BioConductor	
  packages	
  
                                 Metabolights – an open access general-purpose repository for
                                 metabolomics studies and associated meta-data	

                                 Haug et al, 2012	

                                 Nucleic Acids Research
The	
  implicit	
  seman)cs	
  of	
  the	
  	
  
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  syntax	
  
Hybridiza)on	
                                                                                             Derived	
  Array	
  Data	
  File	
  
         Sample	
  Name	
   Material	
  Type	
                             Assay	
  Design	
  REF	
   Array	
  Data	
  File	
     Protocol	
  REF	
  
                                                       Assay	
  Name	
                                                                                            	
  
                            	
  

         sample1	
           genomic	
  DNA	
          assay1	
            A-AFFY-107"                 assay1.cel	
               data	
  normaliza)on	
          assay1.txt	
  



         sample2	
           genomic	
  DNA	
          assay2	
            A-AFFY-107"                 assay2.cel	
               data	
  normaliza)on	
          assay2.txt	
  



         sample3	
           genomic	
  DNA	
          assay3	
            A-AFFY-107"                 assay3.cel	
               data	
  normaliza)on	
          assay3.txt	
  




Material	
  transforma)ons...	
  

                                 Material	
  Node	
                                                                                   Data	
  File	
  Node	
  
                                                                                                                                               "
                                     "                                                                                                             DATA!
                    Characteristics[…]	

                                      Material!                                                                                                           Derived Data File	

                    Factor Value[…]
                    (independent                                                            Protocol	
  
                    variables)	

                                                           Process	
  
                    Material Type	

                    Comment[…]	

                                                                                         Parameter	
  Value	
  
                                                   "
                                                                                         […]	
                                                     "
                                                    Material!                                                                                          DATA!   Raw Data
                                                                                         Performer	
  	
  (operator effect)	

                                 File	

                                                                                         	
  Date	
  (day effect)
45	
  


    Tagging:	
  from	
  free	
  text	
  to	
  ontology-­‐based	
  
             • single	
  interven)on	
  representa)on,	
  free	
  text	
  annota)on	
  
                                                                            Factor	
  
                                               Characteris)cs[organism]	
                         Factor	
                                        Factor	
  
                     Source	
  Name	
                                       Value[perturba)on	
  
                                               	
                                                 Value[dose]	
                                   Value[dura)on]	
  
                                                                            agent]	
  

                     individual1	
             human	
                                   aspirin	
                    high	
  dose	
              12	
  weeks	
  



             • single	
  interven)on,	
  ontology-­‐based	
  annota)on	
  

                                                                                                              Factor	
  
                                   Characteris)cs[organism
                                                           Term	
  Source	
   Term	
  Accession	
             Value[chemical	
             Term	
  Source	
         Term	
  Accession	
  
Source	
  Name	
                   obi:0100026)])	
  
                                                           REF	
              Number	
                        compound	
                   REF	
                    Number	
  
                                   	
  
                                                                                                              CHEBI_37577)]	
  

individual1	
                      Homo	
  sapiens	
                  NCBITax	
         9606	
                aspirin	
                    CHEBI	
                  1231354	
  




 Factor	
                    Term	
  Source	
            Term	
  Accession	
        Factor	
  Value[)me	
                                Term	
  Source	
   Term	
  Accession	
  
                                                                                                                  Unit	
  
 Value[dose(OBI_0000984)	
   REF	
                       Number	
                   (PATO_0000165)]	
                                    REF	
              Number	
  


 low	
  dose	
                    LNC	
                  LP30872-­‐3	
              12	
                          week	
                 UO	
                   0000034 	
  
ToxBank	
  effort	
  
       	
  developed	
  by	
  Nina	
  Jeliazkova	
  	
  




                                                                      Health	
  Care	
  &	
  Life	
  Sciences	
  	
  
Kohonen	
  et	
  al.	
  The	
  ToxBank	
  Data	
  Warehouse:	
  a	
   Interest	
  Group	
  	
  
research	
  cluster	
  of	
  7	
  	
  
EU	
  FP7	
  Health	
  systems	
  toxicology	
  and	
  
toxicogenomics	
  projects.	
  
	
  
•  Make	
  the	
  seman)cs	
  of	
  ISAtab	
  explicit,	
  including	
  
   materials	
  &	
  data	
  en))es	
  &	
  processes	
  &	
  their	
  
   rela)onships	
  
•  Provide	
  incen)ves	
  for	
  provision	
  of	
  ontology-­‐
   based	
  annota)ons	
  in	
  ISA-­‐TAB	
  datasets;	
  exploit	
  
   those	
  annota)ons	
  	
  
•  Augment	
  ISA	
  syntax	
  with	
  new	
  elements	
  (e.g.	
  
   groups),	
  facilita)ng	
  the	
  understanding	
  &	
  
   querying	
  of	
  experimental	
  design	
  
•  Facilitate	
  data	
  integra)on	
  &	
  knowledge	
  
   discovery/reasoning	
  
architecture	
  




ISA-­‐TAB	
  
 parser	
            	
  	
  	
  	
  	
  graph	
                    isa2owl	
  mapping	
  
                                 analysis	
                              parser	
  


                Configura)on	
  
                    file	
  

                                                                                 Implementa)on:	
  
                                                                                 -­‐  java-­‐based	
  
                                                                                 -­‐  Using	
  owlapi	
  
vocabularies	
  


              Chemical	
                        Biomolecular	
  	
                                   Informa)on	
  
               domain	
                            domain	
                                            domain	
  
                   	
                                                	
                                             	
  
Experimental	
  
  domain	
  




                                                                                                             Factor	
  
                                               Characteris)cs[organi
                                                                     Term	
            Term	
  Accession	
   Value[chemical	
     Term	
  Source	
   Term	
  Accession	
  
                          Source	
  Name	
     smobi:0100026)])	
  
                                                                     Source	
  REF	
   Number	
              compound	
           REF	
              Number	
  
                   	
  




                                               	
  
                                                                                                             CHEBI_37577)]	
  

                          individual1	
        Homo	
  sapiens	
            NCBITax	
     9606	
             aspirin	
            CHEBI	
           1231354	
  
Open	
  Biological	
  and	
  
             Biomedical	
  Ontologies	
  
             (OBO)	
  Foundry	
                                              BFO	
  



          ChEBI	
                                    GO	
                                             IAO	
  




                                                                                                   Factor	
  
                                     Characteris)cs[organi
OBI	
  




                                                           Term	
            Term	
  Accession	
   Value[chemical	
     Term	
  Source	
   Term	
  Accession	
  
                Source	
  Name	
     smobi:0100026)])	
  
                                                           Source	
  REF	
   Number	
              compound	
           REF	
              Number	
  
                                     	
  
                                                                                                   CHEBI_37577)]	
  

                individual1	
        Homo	
  sapiens	
         NCBITax	
      9606	
               aspirin	
            CHEBI	
           1231354	
  
ISA-­‐OBI	
  mapping	
  
ISA-­‐SIO	
  mapping	
  
faahKO	
  dataset	
  
	
  Available	
  in	
  
Bioconductor	
  	
  
(with	
  ISA-­‐TAB	
  
metadata)	
  
Global	
  metabolite	
  
profiling	
  

 Data	
  subset:	
  LC/
 MS	
  peaks	
  from	
  the	
  
 spinal	
  cords	
  of	
  6	
  
 wild-­‐type	
  and	
  6	
  
 FAAH	
  (fa[y	
  acid	
  
 amyde	
  hydrolase)	
  
 knockout	
  mice	
  
•  support	
  different	
  conversion	
  modes	
  (different	
  levels	
  of	
  
   granularity)	
  
•  querying	
  for	
  ISA-­‐TAB	
  datasets,	
  across	
  mul)ple	
  
   experiment	
  types	
  
•  reasoning	
  exploi)ng	
  ontology	
  annota)ons	
  
    –  	
  seman)c	
  valida)on	
  of	
  ISA-­‐TAB	
  datasets	
  
•  augmented	
  annota)on	
  over	
  na)ve	
  ISA	
  syntax	
  
    –  iden)fica)on	
  gaps	
  in	
  ontological	
  representa)ons	
  	
  
    –  feedback	
  of	
  findings	
  to	
  community	
  ontologies	
  

     	
  
Increasing	
  level	
  of	
  structure	
  	
  
                                   for	
  experimental	
  metadata	
  




Notes	
  in	
  Lab	
  books	
                Spreadsheets	
  &	
  Tables	
     Facts	
  as	
  RDF	
  statements	
  
	
                                           (ISAtab	
  metadata)	
  
                                             	
  
Towards	
  interoperable	
  bioscience	
  data	
  
                                                                                                        Sansone	
  et	
  al,	
  2012	
  
                                                                                                        Nature	
  Gene)cs	
  




A	
  growing	
  ecosystem	
  	
  
of	
  over	
  30	
  public	
  and	
  internal	
  resources	
  using	
  the	
  ISA	
  metadata	
  tracking	
  framework	
  	
  
to	
  facilitate	
  standards-­‐compliant	
  collec)on,	
  cura)on,	
  management	
  and	
  reuse	
  of	
  inves)ga)ons	
  in	
  an	
  
increasingly	
  diverse	
  set	
  of	
  life	
  science	
  domains.	
  
Implementa)on	
  at	
  Harvard	
  




                                 ISA




                     h[p://discovery.hsci.harvard.edu/	
  
Implementa)on	
  at	
  the	
  	
  
European	
  Bioinforma)cs	
  Ins)tute	
  




                                            h[p://www.ebi.ac.uk/metabolights	
  
             60
reasoning	
   visualiza)on	
  
analysis	
   browsing	
   integra)on	
  
    exchange	
   retrieval	
  




Reproducible	
  &	
  Reusable	
  	
  
  Bioscience	
  Research	
  
@isatools	
  @biosharing	
  
isa-­‐tools.org	
  	
  	
  	
  
isacommons.org	
  	
  
biosharing.org	
  

Mais conteúdo relacionado

Destaque

Präsentation missethan
Präsentation missethanPräsentation missethan
Präsentation missethanahclasses
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
Portland 100 kick-off presentation public final
Portland 100 kick-off presentation public finalPortland 100 kick-off presentation public final
Portland 100 kick-off presentation public finalPDCshare
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOAlejandra Gonzalez-Beltran
 
Glossario dei termini di business
Glossario dei termini di businessGlossario dei termini di business
Glossario dei termini di businessGorkem Yigit
 

Destaque (20)

Delphi
DelphiDelphi
Delphi
 
Russell - Welcome & Introduction - AFOSR Overview - Spring Review 2012
Russell - Welcome & Introduction - AFOSR Overview - Spring Review 2012Russell - Welcome & Introduction - AFOSR Overview - Spring Review 2012
Russell - Welcome & Introduction - AFOSR Overview - Spring Review 2012
 
Präsentation missethan
Präsentation missethanPräsentation missethan
Präsentation missethan
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Portland 100 kick-off presentation public final
Portland 100 kick-off presentation public finalPortland 100 kick-off presentation public final
Portland 100 kick-off presentation public final
 
Viewbook 2013
Viewbook 2013Viewbook 2013
Viewbook 2013
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Exposicion de marketing UNA Puno
Exposicion de marketing UNA PunoExposicion de marketing UNA Puno
Exposicion de marketing UNA Puno
 
Megatendencias sociales
Megatendencias sociales Megatendencias sociales
Megatendencias sociales
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Metadata for Interoperable Bioscience
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
 
marketing digital
marketing digitalmarketing digital
marketing digital
 
Glossario dei termini di business
Glossario dei termini di businessGlossario dei termini di business
Glossario dei termini di business
 

Semelhante a BCU 2013

ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012Susanna-Assunta Sansone
 
Susanna Sansone at DataCite: The ISA-Commons - experiences from the field
Susanna Sansone at DataCite: The ISA-Commons - experiences from the fieldSusanna Sansone at DataCite: The ISA-Commons - experiences from the field
Susanna Sansone at DataCite: The ISA-Commons - experiences from the fieldGigaScience, BGI Hong Kong
 
Scratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeScratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeVince Smith
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introductionMIBBI Checklists
 
Streamling statsalao2011
Streamling statsalao2011Streamling statsalao2011
Streamling statsalao2011Amy Fry
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Biodiversity, networks and people
Biodiversity, networks and peopleBiodiversity, networks and people
Biodiversity, networks and peopleMarco Pautasso
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docpraveena06
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 

Semelhante a BCU 2013 (20)

Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
 
Susanna Sansone at DataCite: The ISA-Commons - experiences from the field
Susanna Sansone at DataCite: The ISA-Commons - experiences from the fieldSusanna Sansone at DataCite: The ISA-Commons - experiences from the field
Susanna Sansone at DataCite: The ISA-Commons - experiences from the field
 
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
Scratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeScratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics Landscape
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
Streamling statsalao2011
Streamling statsalao2011Streamling statsalao2011
Streamling statsalao2011
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
D1803012022
D1803012022D1803012022
D1803012022
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Maize database
Maize database Maize database
Maize database
 
Biodiversity, networks and people
Biodiversity, networks and peopleBiodiversity, networks and people
Biodiversity, networks and people
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.doc
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 

Mais de Alejandra Gonzalez-Beltran

Mais de Alejandra Gonzalez-Beltran (11)

The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
 
CMSO Minimal reporting requirements
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMed
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
ISA commons - overview and latest developments
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
 
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionBrazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare session
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
SELENfest 2012
SELENfest 2012SELENfest 2012
SELENfest 2012
 

BCU 2013

  • 1. The  Inves)ga)on/Study/Assay  (ISA)   metadata  framework  for  reproducible   and  reusable  bioscience  research   Alejandra  González-­‐Beltrán,  PhD   on  behalf  of  the  ISATeam       Oxford  e-­‐Research  Centre,  University  of  Oxford     Faculty  of  Technology,  Environment  and  Engineering   Birmingham  City  University   12th  March  2013    
  • 2. Ioannidis   et   al.,   Repeatability   of   published   microarray   gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55   (2009)  doi:10.1038/ng.295    
  • 3. Ioannidis   et   al.,   Repeatability   of   published   microarray   gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55   (2009)  doi:10.1038/ng.295    
  • 5. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593  
  • 6. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593   h[p://www.ny)mes.com/2011/07/08/health/research/08genes.html  
  • 7. Contextual  informa)on  (metadata):   •  Sample  characteris)cs   •  Technology  and  measurement  types   •  Instrument  parameters   •  …  
  • 8. Need  for  a  generic  representa)on,  applied  to:    •microarray  based  experiments  (MAGE)    •sequencing  based  experiments  (SRA)    •flow  cytometry  based  experiments  (FuGE-­‐Flow  Cyt)    •mass  spectrometry  and  NMR  spectroscopy   experiments  (Metabolights  and  PRIDE)  
  • 9. Roadmap   Reproducible  &  Reusable     Bioscience  Research  
  • 10. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research  
  • 11. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  • 12. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Community  Standards   Sodware  Tools   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  • 13. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Reproducible  &  Reusable     Bioscience  Research  
  • 14.
  • 15. Bioscience  is  mul)-­‐domain…   health   env   agro   tox/pharma   §       Interdisciplinary  and  integra:ve  in  character     •  need  to  deal  with  new  and  exis:ng  datasets   •  deal  with  a  variety  of  data  types   Source  of  the  figure:  EBI  website  
  • 16. Mul)ple  communi)es,  mul)ple  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essen)al  informa)on     Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
  • 17. Growing  number  of  bioscience  repor)ng  standards   303  +       150  +       130  +       Source:  MIBBI,     Source:  BioPortal   Es:mated   EQUATOR   Databases,     annota)on,   cura)on     tools   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 18. But…     what  do  we  know  about  them  and  how  they  are  related   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 19. But…     what  do  we  know  about  them  and  how  they  are  related   I  use  high  throughput   Which  tools  and   sequencing  technologies,   databases   which  ones  are  relevant  to   implement  which   me?   standards?   How  can  I  get   What  are  the   involved  to  propose   criteria  to  evaluate   extensions  or   their  status  and   modifica)ons?   value?   Which  ones  are   Which  formats   I  work  on  plants,  are   mature  enough  for   support  specific   these  standards  just   me  to  use  or   minimum   for  biomedical   recommend?   informa)on   applica)ons?   guidelines?  
  • 20. A  coherent,  curated  and   searchable  catalogue  of   data  sharing  resources     •  Bioscience  standards  and   associated  data-­‐sharing   policies,  publica:ons,  tools   and  databases   •  Assessment  criteria  for   usability  and  popularity  of   standards   •  Rela:onships  among   standards   •  Encouragement  for   communica:on  &   interac:on  among  groups   •  Promo)ng  interoperability   &  informed  decisions  about   standards  
  • 21.                            infrastructure  
  • 22. ISA  sodware  suite:  suppor)ng   standards-­‐compliant  experimental   annota)on  and  enabling  cura)on  at                              infrastructure   the  community  level   Rocca-­‐Serra  et  al,    2010   Bioinforma)cs   •  Assist  in  the  annota)on  and  management  of   experimental  metadata  at  source,  suppor)ng  data   provenance  tracking   •  Deal  with  high-­‐throughput  studies  using  one  or  a   combina)on  of  omics  and  other  technologies   •  Empower  users  to  uptake  community-­‐defined  checklists   and  ontologies   •  Facilitate  data  sharing,  re-­‐use,  comparison  and   reproducibility  of  experiments,  submission  to   interna)onal  public  repositories  
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. faahKO  dataset   •  Available  in  Bioconductor   •  Subset  of  the  original  data  on  global  metabolite  profiling   Saghatlian  et  al.   Biochemistry.  2004   •  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH   (fa[y  acid  amyde  hydrolase)  knockout  mice  
  • 28. -­‐    Define  key  en))es  (e.g.  factors,     protocols,  parameters)   -­‐  Grouping  of  studies   -­‐  Relate  studies  and  assays   faahKO  inves)ga)on  
  • 29. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs   faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     NEWT  UniProt  Taxonomy  Database   Mouse  Genome  Informa)cs  
  • 30. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs   faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     Mouse  Adult  Gross  Anatomy  
  • 31. -­‐  measurement  type,  e.g.  metabolite  profiling   -­‐  technology,  e.g.  mass  spectrometry   faahKO  assay  
  • 32.
  • 33.
  • 34. Create template(s) to fit the type of experiments to be described   Create  templates  detailing  the  steps  to  be   reported  for  different  inves)ga)ons,  complying   to  community  standards,  e.g.  configuring  the   value(s)  allowed  for  each  field  to  be     •  text  (with/without  regular  expression  tes)ng),   •  ontology  terms,   •  numbers  etc.          
  • 35. Describe, curate your experiment using a desktop-based tool   Report and edit the description using this tool, (also customized using the templates) with a spreadsheet like look and feel, packed with functionalities such as •  ontology search (access via ) •  term-tagging features •  import from spreadsheets etc…  
  • 36. •  Ontology  search  and  automated  tagging    (relying  on     NCBO  Bioportal  services)  on  Google  Spreadsheets   •  Collabora)ve  annota)on;  support  for  distributed  users   •  Version  control  &  history   OntoMaton:  a  Bioportal  powered   Ontology  widget  for  Google   Spreadsheets   Maguire  et  al,    2013   Bioinforma)cs  
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. •  R  package  available  in  BioConductor  2.11     h[p://bioconductor.org/packages/release/bioc/html/Risa.html   •  ISAtab  class   •  Read  ISAtab  files  into  ISAtab  objects  and  write  ISAtab   files  back  to  disk   •  Increment  metadata  with  defini)on  factors/ treatments/groups   •  Build  xcmsSet  (xcms  package)  objects  from  mass   spectrometry  assays       •  Augment  the  ISAtab  dataset  ader  analysis   •                                                           source  &  issues  tracking     h[ps://github.com/ISA-­‐tools/Risa          
  • 42. •  faahKO  package  v.  2.12  contains  ISAtab  files   describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral   Data  File","faahkoDSDF.txt"  )   •  MTBLS2  processing  and  analysis  using  Risa,  xcms  and   CAMERA  BioConductor  packages   Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research
  • 43. The  implicit  seman)cs  of  the                                                            syntax  
  • 44. Hybridiza)on   Derived  Array  Data  File   Sample  Name   Material  Type   Assay  Design  REF   Array  Data  File   Protocol  REF   Assay  Name       sample1   genomic  DNA   assay1   A-AFFY-107" assay1.cel   data  normaliza)on   assay1.txt   sample2   genomic  DNA   assay2   A-AFFY-107" assay2.cel   data  normaliza)on   assay2.txt   sample3   genomic  DNA   assay3   A-AFFY-107" assay3.cel   data  normaliza)on   assay3.txt   Material  transforma)ons...   Material  Node   Data  File  Node   " " DATA! Characteristics[…] Material! Derived Data File Factor Value[…] (independent Protocol   variables) Process   Material Type Comment[…] Parameter  Value   " […]   " Material! DATA! Raw Data Performer    (operator effect) File  Date  (day effect)
  • 45. 45   Tagging:  from  free  text  to  ontology-­‐based   • single  interven)on  representa)on,  free  text  annota)on   Factor   Characteris)cs[organism]   Factor   Factor   Source  Name   Value[perturba)on     Value[dose]   Value[dura)on]   agent]   individual1   human   aspirin   high  dose   12  weeks   • single  interven)on,  ontology-­‐based  annota)on   Factor   Characteris)cs[organism Term  Source   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   obi:0100026)])   REF   Number   compound   REF   Number     CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354   Factor   Term  Source   Term  Accession   Factor  Value[)me   Term  Source   Term  Accession   Unit   Value[dose(OBI_0000984)   REF   Number   (PATO_0000165)]   REF   Number   low  dose   LNC   LP30872-­‐3   12   week   UO   0000034  
  • 46. ToxBank  effort    developed  by  Nina  Jeliazkova     Health  Care  &  Life  Sciences     Kohonen  et  al.  The  ToxBank  Data  Warehouse:  a   Interest  Group     research  cluster  of  7     EU  FP7  Health  systems  toxicology  and   toxicogenomics  projects.    
  • 47. •  Make  the  seman)cs  of  ISAtab  explicit,  including   materials  &  data  en))es  &  processes  &  their   rela)onships   •  Provide  incen)ves  for  provision  of  ontology-­‐ based  annota)ons  in  ISA-­‐TAB  datasets;  exploit   those  annota)ons     •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilita)ng  the  understanding  &   querying  of  experimental  design   •  Facilitate  data  integra)on  &  knowledge   discovery/reasoning  
  • 48. architecture   ISA-­‐TAB   parser            graph   isa2owl  mapping   analysis   parser   Configura)on   file   Implementa)on:   -­‐  java-­‐based   -­‐  Using  owlapi  
  • 49. vocabularies   Chemical   Biomolecular     Informa)on   domain   domain   domain         Experimental   domain   Factor   Characteris)cs[organi Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number       CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 50. Open  Biological  and   Biomedical  Ontologies   (OBO)  Foundry   BFO   ChEBI   GO   IAO   Factor   Characteris)cs[organi OBI   Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number     CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 53. faahKO  dataset    Available  in   Bioconductor     (with  ISA-­‐TAB   metadata)   Global  metabolite   profiling   Data  subset:  LC/ MS  peaks  from  the   spinal  cords  of  6   wild-­‐type  and  6   FAAH  (fa[y  acid   amyde  hydrolase)   knockout  mice  
  • 54.
  • 55. •  support  different  conversion  modes  (different  levels  of   granularity)   •  querying  for  ISA-­‐TAB  datasets,  across  mul)ple   experiment  types   •  reasoning  exploi)ng  ontology  annota)ons   –   seman)c  valida)on  of  ISA-­‐TAB  datasets   •  augmented  annota)on  over  na)ve  ISA  syntax   –  iden)fica)on  gaps  in  ontological  representa)ons     –  feedback  of  findings  to  community  ontologies    
  • 56. Increasing  level  of  structure     for  experimental  metadata   Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements     (ISAtab  metadata)    
  • 57.
  • 58. Towards  interoperable  bioscience  data   Sansone  et  al,  2012   Nature  Gene)cs   A  growing  ecosystem     of  over  30  public  and  internal  resources  using  the  ISA  metadata  tracking  framework     to  facilitate  standards-­‐compliant  collec)on,  cura)on,  management  and  reuse  of  inves)ga)ons  in  an   increasingly  diverse  set  of  life  science  domains.  
  • 59. Implementa)on  at  Harvard   ISA h[p://discovery.hsci.harvard.edu/  
  • 60. Implementa)on  at  the     European  Bioinforma)cs  Ins)tute   h[p://www.ebi.ac.uk/metabolights   60
  • 61. reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Reproducible  &  Reusable     Bioscience  Research  
  • 62. @isatools  @biosharing   isa-­‐tools.org         isacommons.org     biosharing.org