SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
The	
  open	
  source	
  ISA	
  soOware	
  suite	
  and	
  its	
  
        internaQonal	
  user	
  community:	
  
Knowledge	
  management	
  of	
  experimental	
  data	
  

                             Alejandra	
  González-­‐Beltrán	
  

                      Senior Software Engineer, ISATeam
              Oxford	
  e-­‐Research	
  Centre,	
  University	
  of	
  Oxford	
  
                                                	
  Oxford,	
  UK

            NETTAB	
  2012	
  –	
  Integrated	
  Bio-­‐Search,	
  Como,	
  Italy,	
  November	
  14-­‐16	
  
Outline	
  
•  Knowledge	
  management	
  of	
  experimental	
  data	
  
     –  SeSng	
  the	
  scene	
  
     –  The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ecosystem:	
  ISA-­‐tab,	
  tools,	
  community	
  
     –  Use	
  case	
  

•  Latest	
  addiQons	
  	
  




•  Related	
  projects	
  &	
  main	
  points	
  
SeSng	
  the	
  scene	
  
                                                   health	
  

                                                                 agro	
  



 env	
  


                     tox/pharma	
  
           Source	
  of	
  the	
  figure:	
  EBI	
  website	
  

Bioscience	
  
	
  is	
  mulQ-­‐domain…	
  
SeSng	
  the	
  scene	
  
                                                   health	
  

                                                                 agro	
  



 env	
  


                     tox/pharma	
  
           Source	
  of	
  the	
  figure:	
  EBI	
  website	
  

Bioscience	
  
	
  is	
  mulQ-­‐domain…	
                                                  Petabytes	
  of	
  data	
  
SeSng	
  the	
  scene	
  
                                                   health	
  

                                                                 agro	
  



 env	
  


                     tox/pharma	
  
           Source	
  of	
  the	
  figure:	
  EBI	
  website	
  

Bioscience	
  
	
  is	
  mulQ-­‐domain…	
                                                   Petabytes	
  of	
  data	
  
                                                                 Experimental	
  metadata	
  
                                                                      in	
  Lab	
  books	
  
inves&ga&on	
  study	
  assay	
  

•  Assist	
  in	
  the	
  annotaQon	
  and	
  management	
  of	
  
   experimental	
  data	
  at	
  source	
  	
  
•  Deal	
  with	
  data	
  from	
  high-­‐throughput	
  studies	
  
   using	
  one	
  or	
  a	
  combinaQon	
  of	
  omics	
  and	
  other	
  
   technologies	
  
•  Empower	
  users	
  to	
  uptake	
  community-­‐defined	
  
   checklists	
  and	
  ontologies	
  
•  Facilitate	
  data	
  sharing,	
  reuse,	
  comparison	
  and	
  
   reproducibility	
  of	
  experiments,	
  submission	
  to	
  
   internaQonal	
  public	
  repositories	
  
The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ecosystem	
  
The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ecosystem	
  




ISA software suite: supporting standards-compliant               Towards interoperable bioscience data	

experimental annotation and enabling curation at the             Sansone et al, 2012	

community level	

                                               Nature Genetics	

Rocca-Serra et al, 2010	

Bioinformatics
General	
  purpose	
  &	
  flexible	
  format	
  
Domain	
  agnosQc	
  
Captures	
  metadata	
  in	
  omics	
  
experiments	
  and	
  tradiQonal	
  
experiments	
  (e.g.	
  clinical	
  chemistry	
  
and	
  histology)	
  
faahKO	
  dataset	
  
•  Available	
  in	
  BioConductor	
  
•  Subset	
  of	
  the	
  original	
  data	
  on	
  global	
  metabolite	
  profiling	
  

                                                                           Saghatlian	
  et	
  al.	
  
                                                                           Biochemstry.	
  2004	
  




•  LC/MS	
  peaks	
  from	
  the	
  spinal	
  cords	
  of	
  6	
  wild-­‐type	
  and	
  6	
  FAAH	
  
   (facy	
  acid	
  amyde	
  hydrolase)	
  knockout	
  mice	
  
-­‐	
  	
  Define	
  key	
  enQQes	
  (e.g.	
  factors,	
  	
  
protocols,	
  parameters)	
  
-­‐	
  Grouping	
  of	
  studies	
  
-­‐	
  Relate	
  studies	
  and	
  assays	
                      faahKO	
  invesQgaQon	
  
-­‐  Subjects	
  studied:	
  source(s),	
  sampling	
  
                                                 methodology,	
  characterisQcs	
  
faahKO	
  study	
                                -­‐  treatments/manipulaQons	
  performed	
  	
  
                                                 to	
  prepare	
  the	
  specimens	
  
                                                 	
  




   NEWT	
  UniProt	
  Taxonomy	
  Database	
  
                                                        Mouse	
  Genome	
  InformaQcs	
  
-­‐  Subjects	
  studied:	
  source(s),	
  sampling	
  
                                methodology,	
  characterisQcs	
  
faahKO	
  study	
               -­‐  treatments/manipulaQons	
  performed	
  	
  
                                to	
  prepare	
  the	
  specimens	
  
                                	
  




                      Mouse	
  Adult	
  Gross	
  Anatomy	
  
-­‐  measurement	
  type,	
  e.g.	
  metabolite	
  profiling	
  
-­‐  technology,	
  e.g.	
  mass	
  spectrometry	
                faahKO	
  assay	
  
Report	
  and	
  edit	
  the	
  descripQon	
  of	
  the	
  invesQgaQon	
  
                  using	
  Google	
  Spreadsheets.	
  	
  
                                    	
  
 Use	
  Google	
  Spreadsheets	
  in	
  combinaQon	
  with	
  ISA-­‐
Tab	
  templates	
  (created	
  through	
  imporQng	
  the	
  Excel	
  
 file	
  from	
  the	
  ISAconfigurator)	
  and	
  OntoMaton	
  (for	
  
 ontology	
  search	
  and	
  tagging	
  support)	
  to	
  report	
  an	
  
                            invesQgaQon.	
  
-­‐  collaboraQve	
  annotaQon	
  
                                                                           -­‐  distributed	
  groups	
  of	
  users	
  
                                                                           -­‐  version	
  control	
  &	
  history	
  
                                                                           	
  
Ontology	
  Search	
  and	
  Tagging	
  in	
  Google	
  Spreadsheets	
  
Create	
  templates	
  detailing	
  the	
  steps	
  to	
  be	
  reported	
  for	
  
different	
  invesQgaQons,	
  complying	
  to	
  community	
  standards	
  
  (listed	
  at	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ),	
  e.g.	
  configuring	
  fields	
  to	
  be	
  (i)	
  
 ontology	
  terms,	
  (ii)	
  text	
  (with/without	
  regular	
  expression	
  
                                                         tesQng),	
  (iii)	
  numbers	
  etc.	
  
From	
  the	
  ISA-­‐Tab	
  we	
  can	
  perform	
  analysis,	
  convert	
  to	
  RDF/OWL	
  and	
  other	
  formats	
  for	
  submission/
                                              sharing	
  to	
  local/remote	
  repositories,	
  	
  
From	
  the	
  ISA-­‐Tab	
  we	
  can	
  perform	
  analysis,	
  convert	
  to	
  RDF/OWL	
  and	
  other	
  formats	
  for	
  submission/
                                              sharing	
  to	
  local/remote	
  repositories,	
  	
  




                                       +	
  VisualisaQon	
  Methods	
  
faahKO	
  Groups	
  



faahKO	
  Workflow	
  




                        Maguire	
   E,	
   Rocca-­‐Serra	
   P,	
   Sansone	
   SA,	
  
                        Davies	
  J	
  and	
  Chen	
  M.	
  
                        Taxonomy-­‐based	
   Glyph	
   Design	
   -­‐-­‐	
   with	
   a	
  
                        Case	
   Study	
   on	
   Visualizing	
   Workflows	
   of	
  
                        Biological	
  Experiments,	
  
                          IEEE	
  Transac9ons	
  on	
  Visualiza9on	
  and	
  
                         Computer	
  Graphics,	
  volume	
  18,	
  2012	
  (in	
  
                                                    press)	
  
•  R	
  package	
  available	
  in	
  BioConductor	
  2.11	
  	
  
                     hcp://bioconductor.org/packages/release/bioc/html/Risa.html	
  

•  ISAtab	
  class	
  
•  Read	
  ISAtab	
  files	
  into	
  ISAtab	
  objects	
  and	
  save	
  
   ISAtab	
  files	
  
•  Build	
  xcmsSet	
  (xcms	
  package)	
  objects	
  from	
  
   mass	
  spectrometry	
  assays	
  	
  	
  
•  Augment	
  the	
  ISAtab	
  dataset	
  aOer	
  analysis	
  
•  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  source	
  &	
  issues	
  tracking	
  
                                                                         	
  
                                                                    hcps://github.com/ISA-­‐tools/Risa	
  
                                                                    	
   	
  
                                                                              	
  
•  faahKO	
  package	
  v.	
  2.12	
  contains	
  ISAtab	
  files	
  
   describing	
  the	
  experiment	
  
    	
  	
  	
  	
  faahkoISA	
  =	
  readISAta(find.package("faahKO"))	
  
    	
  	
  	
  	
  assay.filename	
  <-­‐	
  faahkoISA["assay.filenames"][[1]]	
  
    	
  	
  	
  	
  xset	
  =	
  processAssayXcmsSet(faahkoISA,	
  assay.filename)	
  
    	
  	
  	
  	
  …	
  
    	
  	
  	
  	
  updateAssayMetadata(faahkoISA,	
  assay.filename,"Derived	
  Spectral	
  
    Data	
  File","faahkoDSDF.txt"	
  )	
  
•  MTBLS2	
  processing	
  and	
  analysis	
  using	
  Risa,	
  xcms	
  and	
  
   CAMERA	
  BioConductor	
  packages	
  
                                 Metabolights – an open access general-purpose repository for
                                 metabolomics studies and associated meta-data	

                                 Haug et al, 2012	

                                 Nucleic Acids Research
 ISA	
  syntax	
  	
  
                         &	
  Underlying	
  Material/Data	
  workflows	
  




 Input	
  Material	
  or	
                                Output	
  Material	
  or	
  
 Data	
  Node	
                                           Data	
  Node	
  



Characteris9cs[…]	
  
Factor	
  Value[…]	
                                                     Characteris9cs[…]	
  
                                                                         Factor	
  Value[…]	
  
                                       Protocol	
  REF	
  

                          Parameter	
  Value	
  […]	
  
                                                                                                  26	
  
•  Make	
  the	
  semanQcs	
  of	
  ISAtab	
  explicit,	
  including	
  
   materials	
  &	
  data	
  enQQes	
  &	
  processes	
  
•  Exploit	
  the	
  semanQc	
  annotaQons	
  available	
  in	
  
   ISAtab	
  datasets	
  
•  Augment	
  ISA	
  syntax	
  with	
  new	
  elements	
  (e.g.	
  
   groups),	
  facilitaQng	
  the	
  understanding	
  &	
  
   querying	
  of	
  experimental	
  design	
  
•  Facilitate	
  data	
  integraQon	
  &	
  knowledge	
  
   discovery/reasoning	
  
ISAtab	
  datasets	
  as	
  linked	
  data	
  	
  
•  Connect	
  to	
  the	
  growing	
  Linked	
  Data	
  universe	
  	
  
     	
  	
  RDF	
  =	
  Resource	
  DescripQon	
  Framework,	
  OWL	
  =	
  Web	
  Ontology	
  Language	
  

•  CollaboraQons	
  with	
  Toxbank	
  (	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  )	
  
	
   &	
   W3C	
   Health	
   Care	
   &	
   Life	
   Sciences	
   Interest	
   Group	
  
(HCLSIG)	
  



       <subject,	
  predicate,	
  object>	
  
       	
  
       <lipoprotein>	
  <parQcipates_in>	
  <inflammatory	
  response>	
  
       	
  
       <PRO:212342352>	
  <BFO_0000056>	
  <GO:0006954>	
  
ISAtab	
  dataset	
             ISAtab	
  Graph	
  
    Parser	
                       Analysis	
  




           ISA	
  Mapping	
  
              Parser	
  
ISA-­‐OBO-­‐mapping	
  
has	
  specified	
  input	
  

                                    type	
  
material	
  enQty	
                                Saghantelian_1	
                                              sample	
  
                                                                                                            	
  collecQon	
  

                                                       derives	
  from	
  

                                                                                     has	
  specified	
  output	
  
                                                                                                                                    type	
  
                                       type	
               KO1	
  
                                                                                     has	
  specified	
  input	
  
     processed	
  	
  
      material	
  
                                                       derives	
  from	
  
                                                                                     extracQon	
                                 material	
  	
  
                                                                                                                                processing	
  
                               type	
                                         has	
  specified	
  output	
  
                                                     KO1_extract	
  
                                                                                        has	
  specified	
  input	
                   type	
  



   InformaQon	
                                        derives	
  from	
  
                                                                                                    mass	
  
  content	
  enQty	
                                                                            spectrometry	
  

                                                                                            has	
  specified	
  output	
  
                         type	
  
                                                  ./cdf/KO/ko15.CDF	
  
Increasing	
  level	
  of	
  structure…	
  
                       …different	
  target	
  audiences	
  




    Notes	
  in	
  Lab	
  books	
     Spreadsheets	
  &	
  Tables	
       Facts	
  as	
  RDF	
  statements	
  
(informaQon	
  for	
  humans)	
         (ISAtab	
  metadata)	
          (informaQon	
  for	
  machines)	
  
core	
  organizaQon	
  in	
  the	
  




          UK	
  Node	
  
Implementation at Harvard




                            ISA




                hcp://discovery.hsci.harvard.edu/	
  
                	
  
Implementation at the EBI

hcp://www.ebi.ac.uk/metabolights	
  
	
  




      Metabolights – an open access general-purpose repository for
      metabolomics studies and associated meta-data	

      Haug et al, 2012	

      Nucleic Acids Research	

                    35
The	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ecosystem	
  
@isatools	
  @biosharing	
  
Isa-­‐tools.org	
  	
  	
  	
  	
  isacommons.org	
  	
  	
  	
  biosharing.org	
  

Mais conteúdo relacionado

Mais procurados

Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 

Mais procurados (20)

ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
ROHub
ROHubROHub
ROHub
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 

Semelhante a Knowledge management of experimental data using open source ISA software suite

BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Alex Hardisty
 
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012Susanna-Assunta Sansone
 
Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data ManagementAlberto Labarga
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
ISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewNina Jeliazkova
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 

Semelhante a Knowledge management of experimental data using open source ISA software suite (20)

ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
B.3.5
B.3.5B.3.5
B.3.5
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Biological database
Biological databaseBiological database
Biological database
 
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012
 
Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data Management
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
BCU 2013
BCU 2013BCU 2013
BCU 2013
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
ISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overview
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 

Mais de Alejandra Gonzalez-Beltran

The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipAlejandra Gonzalez-Beltran
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedAlejandra Gonzalez-Beltran
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
ISA commons - overview and latest developments
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developmentsAlejandra Gonzalez-Beltran
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOAlejandra Gonzalez-Beltran
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionBrazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionAlejandra Gonzalez-Beltran
 

Mais de Alejandra Gonzalez-Beltran (12)

The Software Sustainability Institute Fellowship
The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
 
CMSO Minimal reporting requirements
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMed
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
ISA commons - overview and latest developments
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
 
Metadata for Interoperable Bioscience
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare sessionBrazil-UK Frontiers of Engineering - Big data in healthcare session
Brazil-UK Frontiers of Engineering - Big data in healthcare session
 
SELENfest 2012
SELENfest 2012SELENfest 2012
SELENfest 2012
 

Knowledge management of experimental data using open source ISA software suite

  • 1. The  open  source  ISA  soOware  suite  and  its   internaQonal  user  community:   Knowledge  management  of  experimental  data   Alejandra  González-­‐Beltrán   Senior Software Engineer, ISATeam Oxford  e-­‐Research  Centre,  University  of  Oxford    Oxford,  UK NETTAB  2012  –  Integrated  Bio-­‐Search,  Como,  Italy,  November  14-­‐16  
  • 2. Outline   •  Knowledge  management  of  experimental  data   –  SeSng  the  scene   –  The                                ecosystem:  ISA-­‐tab,  tools,  community   –  Use  case   •  Latest  addiQons     •  Related  projects  &  main  points  
  • 3. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website   Bioscience    is  mulQ-­‐domain…  
  • 4. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website   Bioscience    is  mulQ-­‐domain…   Petabytes  of  data  
  • 5. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website   Bioscience    is  mulQ-­‐domain…   Petabytes  of  data   Experimental  metadata   in  Lab  books  
  • 6. inves&ga&on  study  assay   •  Assist  in  the  annotaQon  and  management  of   experimental  data  at  source     •  Deal  with  data  from  high-­‐throughput  studies   using  one  or  a  combinaQon  of  omics  and  other   technologies   •  Empower  users  to  uptake  community-­‐defined   checklists  and  ontologies   •  Facilitate  data  sharing,  reuse,  comparison  and   reproducibility  of  experiments,  submission  to   internaQonal  public  repositories  
  • 7. The                          ecosystem  
  • 8. The                          ecosystem   ISA software suite: supporting standards-compliant Towards interoperable bioscience data experimental annotation and enabling curation at the Sansone et al, 2012 community level Nature Genetics Rocca-Serra et al, 2010 Bioinformatics
  • 9. General  purpose  &  flexible  format   Domain  agnosQc   Captures  metadata  in  omics   experiments  and  tradiQonal   experiments  (e.g.  clinical  chemistry   and  histology)  
  • 10. faahKO  dataset   •  Available  in  BioConductor   •  Subset  of  the  original  data  on  global  metabolite  profiling   Saghatlian  et  al.   Biochemstry.  2004   •  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH   (facy  acid  amyde  hydrolase)  knockout  mice  
  • 11. -­‐    Define  key  enQQes  (e.g.  factors,     protocols,  parameters)   -­‐  Grouping  of  studies   -­‐  Relate  studies  and  assays   faahKO  invesQgaQon  
  • 12. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characterisQcs   faahKO  study   -­‐  treatments/manipulaQons  performed     to  prepare  the  specimens     NEWT  UniProt  Taxonomy  Database   Mouse  Genome  InformaQcs  
  • 13. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characterisQcs   faahKO  study   -­‐  treatments/manipulaQons  performed     to  prepare  the  specimens     Mouse  Adult  Gross  Anatomy  
  • 14. -­‐  measurement  type,  e.g.  metabolite  profiling   -­‐  technology,  e.g.  mass  spectrometry   faahKO  assay  
  • 15.
  • 16.
  • 17.
  • 18. Report  and  edit  the  descripQon  of  the  invesQgaQon   using  Google  Spreadsheets.       Use  Google  Spreadsheets  in  combinaQon  with  ISA-­‐ Tab  templates  (created  through  imporQng  the  Excel   file  from  the  ISAconfigurator)  and  OntoMaton  (for   ontology  search  and  tagging  support)  to  report  an   invesQgaQon.  
  • 19. -­‐  collaboraQve  annotaQon   -­‐  distributed  groups  of  users   -­‐  version  control  &  history     Ontology  Search  and  Tagging  in  Google  Spreadsheets  
  • 20. Create  templates  detailing  the  steps  to  be  reported  for   different  invesQgaQons,  complying  to  community  standards   (listed  at                                                    ),  e.g.  configuring  fields  to  be  (i)   ontology  terms,  (ii)  text  (with/without  regular  expression   tesQng),  (iii)  numbers  etc.  
  • 21. From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/ sharing  to  local/remote  repositories,    
  • 22. From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/ sharing  to  local/remote  repositories,     +  VisualisaQon  Methods  
  • 23. faahKO  Groups   faahKO  Workflow   Maguire   E,   Rocca-­‐Serra   P,   Sansone   SA,   Davies  J  and  Chen  M.   Taxonomy-­‐based   Glyph   Design   -­‐-­‐   with   a   Case   Study   on   Visualizing   Workflows   of   Biological  Experiments,   IEEE  Transac9ons  on  Visualiza9on  and   Computer  Graphics,  volume  18,  2012  (in   press)  
  • 24. •  R  package  available  in  BioConductor  2.11     hcp://bioconductor.org/packages/release/bioc/html/Risa.html   •  ISAtab  class   •  Read  ISAtab  files  into  ISAtab  objects  and  save   ISAtab  files   •  Build  xcmsSet  (xcms  package)  objects  from   mass  spectrometry  assays       •  Augment  the  ISAtab  dataset  aOer  analysis   •                                                           source  &  issues  tracking     hcps://github.com/ISA-­‐tools/Risa        
  • 25. •  faahKO  package  v.  2.12  contains  ISAtab  files   describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral   Data  File","faahkoDSDF.txt"  )   •  MTBLS2  processing  and  analysis  using  Risa,  xcms  and   CAMERA  BioConductor  packages   Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research
  • 26.  ISA  syntax     &  Underlying  Material/Data  workflows   Input  Material  or   Output  Material  or   Data  Node   Data  Node   Characteris9cs[…]   Factor  Value[…]   Characteris9cs[…]   Factor  Value[…]   Protocol  REF   Parameter  Value  […]   26  
  • 27. •  Make  the  semanQcs  of  ISAtab  explicit,  including   materials  &  data  enQQes  &  processes   •  Exploit  the  semanQc  annotaQons  available  in   ISAtab  datasets   •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilitaQng  the  understanding  &   querying  of  experimental  design   •  Facilitate  data  integraQon  &  knowledge   discovery/reasoning  
  • 28. ISAtab  datasets  as  linked  data     •  Connect  to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripQon  Framework,  OWL  =  Web  Ontology  Language   •  CollaboraQons  with  Toxbank  (                                )     &   W3C   Health   Care   &   Life   Sciences   Interest   Group   (HCLSIG)   <subject,  predicate,  object>     <lipoprotein>  <parQcipates_in>  <inflammatory  response>     <PRO:212342352>  <BFO_0000056>  <GO:0006954>  
  • 29. ISAtab  dataset   ISAtab  Graph   Parser   Analysis   ISA  Mapping   Parser  
  • 31. has  specified  input   type   material  enQty   Saghantelian_1   sample    collecQon   derives  from   has  specified  output   type   type   KO1   has  specified  input   processed     material   derives  from   extracQon   material     processing   type   has  specified  output   KO1_extract   has  specified  input   type   InformaQon   derives  from   mass   content  enQty   spectrometry   has  specified  output   type   ./cdf/KO/ko15.CDF  
  • 32. Increasing  level  of  structure…   …different  target  audiences   Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements   (informaQon  for  humans)   (ISAtab  metadata)   (informaQon  for  machines)  
  • 33. core  organizaQon  in  the   UK  Node  
  • 34. Implementation at Harvard ISA hcp://discovery.hsci.harvard.edu/    
  • 35. Implementation at the EBI hcp://www.ebi.ac.uk/metabolights     Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research 35
  • 36. The                          ecosystem  
  • 37. @isatools  @biosharing   Isa-­‐tools.org          isacommons.org        biosharing.org