SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
MCW Driving Biological Project
                             Simon Twigger, PhD




                                     1
Monday, September 27, 2010
Rat Genome Database




                             2
Monday, September 27, 2010
Whats the problem?
        • large scale repositories
          with unused or
          inaccessible information

        • How can these
          databases be made
          more useful?

        • How to help researchers
          find and use this
          information to connect
          genes to disease?



                                     3
Monday, September 27, 2010
Rat researchers ask...

                                   What tissue is this gene expressed in?
                 What expression data is Are any of these genes
               known for SD (aka SD/NHsd,
                 Harlan Sprague Dawley,       associated with my
                  Sprague Dawley) rats?            phenotype?
                         Has this gene been seen in the brain?
                       What rat expression studies have been done on
                       Mammary Cancer(aka breast neoplasms/breast
                      cancer/cancer of the breast, breast carcinoma...)?
Monday, September 27, 2010
What's the strategy?
        • Focus on GEO
                                   GEO Records


          (microarray)                            Create Annotation
                                                  Jobs & Queue Up

                                                                        Q-Out

        • Use NCBO annotator
                                                                                                     1..n Annot. Workers


          to markup text,                                             RabbitMQ                           Index text
          review annotations                                                                               at OBA


          and then use for tools                                       Q-In
                                                                                                          Parse
                                                                                                          Results
          and visualization
                                                 Results saved to                Put results in to
                                                 GMiner database                 queue for save

        • Combine annotations
          with biological data
          to derive new
          insights.



                                             5
Monday, September 27, 2010
Current Ontologies




     http://bioportal.bioontology.org/
Monday, September 27, 2010
7
Monday, September 27, 2010
8
Monday, September 27, 2010
Progress




Monday, September 27, 2010
Linking annotations to data




                Tm2d1
    RGD1306410
                     Svs4
                     Hbb
              Scgb2a1
                       Alb
Monday, September 27, 2010
Linking annotations to data
            Tm2d1
   RGD1306410
                Svs4
                Hbb
           Scgb2a1
                                                     +
                 Alb




                             Hbb   is_expressed_in rat kidney
                             Tm2d1 is_expressed_in rat kidney

                 Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
                 (U34a/b/c, 230, 230v2)
                 62,000 samples x ca. 25,000 genes/sample = 1.5B data points
Monday, September 27, 2010
Probeset results on GMiner

                                  Probeset L08490cds_at for
                                 Gabra1 - gamma-aminobutyric
                                 acid (GABA) A receptor, alpha 1




    Hs GABRA1
Monday, September 27, 2010
QTL
          Hypertensive

                                                                        G      G     G


                                        Phenotype
                                                              Pathway         Strain 1   !=   Strain 2


                                                    G
                                                                   Anatomy
                                                        G
                                                                   (Kidney)
                                        Component
                                            Function
                                                    Process


                             Hypertension

Monday, September 27, 2010
QTL Gene ‘Highlighter’




                             QTL

                         G    G    G




                                                  AllegroGraph

                Disease/Pheno.

                                         GMiner   RGD    OBO     etc

Monday, September 27, 2010
RDF/OWL sources
         Cell Ontology
         http://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl

         Mouse Adult Gross Anatomy
         http://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/
         adult_mouse_anatomy.owl

         Mammalian Phenotype
         http://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/
         mammalian_phenotype.owl

         GO Function
         http://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl

         GO Process
         http://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl

         GO component
         http://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl




Monday, September 27, 2010
Rat Genome Database




      Wide variety of data types - genomic and physiological
      many with corresponding ontologies


                               16
Monday, September 27, 2010
Monday, September 27, 2010
RGD->RDF




                             Existing RGD ‘object types’ &
                                    mappings to SO


Monday, September 27, 2010
RGD Gene




Monday, September 27, 2010
RGD QTL




Monday, September 27, 2010
QTL Highlighter




                  • Rails source code will be available on GitHub
                  • RDFizer (ruby) http://github.com/simont/MCW-RDF
Monday, September 27, 2010
Next Steps
       • Register PURL for RGD

       • Create RGD core object ontology (OWL/RDF)

       • Select appropriate URIs for RGD data

       • Ontology annotations - how best to represent in triple store?



       • Export GMiner data to RDF-> Triple Store

       • Document & refine biological use cases related to candidate gene selection/evaluation

       • Identify additional data required for candidate gene selection, RDFize as appropriate,
         load into triple store.

       • Connections to other RDF collections/LOD, etc.?




Monday, September 27, 2010

Mais conteúdo relacionado

Semelhante a NCBO DBP

Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningGenomeInABottle
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Promoting Science and Technology Exchange using Machine Translation
Promoting Science and Technology Exchange using Machine TranslationPromoting Science and Technology Exchange using Machine Translation
Promoting Science and Technology Exchange using Machine TranslationToshiaki Nakazawa
 
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Daniele Loiacono
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Libraryngotogenome
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)robertstevens65
 
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)University of Tuebingen
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxPRIYANKAZALA9
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingJonathan Eisen
 
Lec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesLec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesUlaş Bağcı
 
My Research Journey with R
My Research Journey with RMy Research Journey with R
My Research Journey with RTom Kelly
 
Genome_annotation@BioDec: Python all over the place
Genome_annotation@BioDec: Python all over the placeGenome_annotation@BioDec: Python all over the place
Genome_annotation@BioDec: Python all over the placeBioDec
 
Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Databiobase
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringPablo Carbonell
 

Semelhante a NCBO DBP (20)

Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
 
Thesis def
Thesis defThesis def
Thesis def
 
Promoting Science and Technology Exchange using Machine Translation
Promoting Science and Technology Exchange using Machine TranslationPromoting Science and Technology Exchange using Machine Translation
Promoting Science and Technology Exchange using Machine Translation
 
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Proteomics
ProteomicsProteomics
Proteomics
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
 
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Lec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine ImagesLec6: Pre-Processing for Nuclear Medicine Images
Lec6: Pre-Processing for Nuclear Medicine Images
 
My Research Journey with R
My Research Journey with RMy Research Journey with R
My Research Journey with R
 
Genome_annotation@BioDec: Python all over the place
Genome_annotation@BioDec: Python all over the placeGenome_annotation@BioDec: Python all over the place
Genome_annotation@BioDec: Python all over the place
 
Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein Engineering
 

Mais de Simon Twigger

Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGSimon Twigger
 
DevOps and Automation for Bioinformaticians
DevOps and Automation for BioinformaticiansDevOps and Automation for Bioinformaticians
DevOps and Automation for BioinformaticiansSimon Twigger
 
the iPad - an interface for Biologists?
the iPad - an interface for Biologists?the iPad - an interface for Biologists?
the iPad - an interface for Biologists?Simon Twigger
 
Semantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene IdentificationSemantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene IdentificationSimon Twigger
 
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...Simon Twigger
 
Virtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the CloudVirtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the CloudSimon Twigger
 

Mais de Simon Twigger (7)

Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNG
 
DevOps and Automation for Bioinformaticians
DevOps and Automation for BioinformaticiansDevOps and Automation for Bioinformaticians
DevOps and Automation for Bioinformaticians
 
the iPad - an interface for Biologists?
the iPad - an interface for Biologists?the iPad - an interface for Biologists?
the iPad - an interface for Biologists?
 
Semantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene IdentificationSemantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene Identification
 
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
 
Virtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the CloudVirtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the Cloud
 

NCBO DBP

  • 1. MCW Driving Biological Project Simon Twigger, PhD 1 Monday, September 27, 2010
  • 2. Rat Genome Database 2 Monday, September 27, 2010
  • 3. Whats the problem? • large scale repositories with unused or inaccessible information • How can these databases be made more useful? • How to help researchers find and use this information to connect genes to disease? 3 Monday, September 27, 2010
  • 4. Rat researchers ask... What tissue is this gene expressed in? What expression data is Are any of these genes known for SD (aka SD/NHsd, Harlan Sprague Dawley, associated with my Sprague Dawley) rats? phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)? Monday, September 27, 2010
  • 5. What's the strategy? • Focus on GEO GEO Records (microarray) Create Annotation Jobs & Queue Up Q-Out • Use NCBO annotator 1..n Annot. Workers to markup text, RabbitMQ Index text review annotations at OBA and then use for tools Q-In Parse Results and visualization Results saved to Put results in to GMiner database queue for save • Combine annotations with biological data to derive new insights. 5 Monday, September 27, 2010
  • 6. Current Ontologies http://bioportal.bioontology.org/ Monday, September 27, 2010
  • 10. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb Monday, September 27, 2010
  • 11. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Monday, September 27, 2010
  • 12. Probeset results on GMiner Probeset L08490cds_at for Gabra1 - gamma-aminobutyric acid (GABA) A receptor, alpha 1 Hs GABRA1 Monday, September 27, 2010
  • 13. QTL Hypertensive G G G Phenotype Pathway Strain 1 != Strain 2 G Anatomy G (Kidney) Component Function Process Hypertension Monday, September 27, 2010
  • 14. QTL Gene ‘Highlighter’ QTL G G G AllegroGraph Disease/Pheno. GMiner RGD OBO etc Monday, September 27, 2010
  • 15. RDF/OWL sources Cell Ontology http://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl Mouse Adult Gross Anatomy http://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/ adult_mouse_anatomy.owl Mammalian Phenotype http://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/ mammalian_phenotype.owl GO Function http://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl GO Process http://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl GO component http://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl Monday, September 27, 2010
  • 16. Rat Genome Database Wide variety of data types - genomic and physiological many with corresponding ontologies 16 Monday, September 27, 2010
  • 18. RGD->RDF Existing RGD ‘object types’ & mappings to SO Monday, September 27, 2010
  • 21. QTL Highlighter • Rails source code will be available on GitHub • RDFizer (ruby) http://github.com/simont/MCW-RDF Monday, September 27, 2010
  • 22. Next Steps • Register PURL for RGD • Create RGD core object ontology (OWL/RDF) • Select appropriate URIs for RGD data • Ontology annotations - how best to represent in triple store? • Export GMiner data to RDF-> Triple Store • Document & refine biological use cases related to candidate gene selection/evaluation • Identify additional data required for candidate gene selection, RDFize as appropriate, load into triple store. • Connections to other RDF collections/LOD, etc.? Monday, September 27, 2010