3. Whats the problem?
• large scale repositories
with unused or
inaccessible information
• How can these
databases be made
more useful?
• How to help researchers
find and use this
information to connect
genes to disease?
3
Monday, September 27, 2010
4. Rat researchers ask...
What tissue is this gene expressed in?
What expression data is Are any of these genes
known for SD (aka SD/NHsd,
Harlan Sprague Dawley, associated with my
Sprague Dawley) rats? phenotype?
Has this gene been seen in the brain?
What rat expression studies have been done on
Mammary Cancer(aka breast neoplasms/breast
cancer/cancer of the breast, breast carcinoma...)?
Monday, September 27, 2010
5. What's the strategy?
• Focus on GEO
GEO Records
(microarray) Create Annotation
Jobs & Queue Up
Q-Out
• Use NCBO annotator
1..n Annot. Workers
to markup text, RabbitMQ Index text
review annotations at OBA
and then use for tools Q-In
Parse
Results
and visualization
Results saved to Put results in to
GMiner database queue for save
• Combine annotations
with biological data
to derive new
insights.
5
Monday, September 27, 2010
6. Current Ontologies
http://bioportal.bioontology.org/
Monday, September 27, 2010
10. Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb
Monday, September 27, 2010
11. Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb
Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney
Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
(U34a/b/c, 230, 230v2)
62,000 samples x ca. 25,000 genes/sample = 1.5B data points
Monday, September 27, 2010
12. Probeset results on GMiner
Probeset L08490cds_at for
Gabra1 - gamma-aminobutyric
acid (GABA) A receptor, alpha 1
Hs GABRA1
Monday, September 27, 2010
13. QTL
Hypertensive
G G G
Phenotype
Pathway Strain 1 != Strain 2
G
Anatomy
G
(Kidney)
Component
Function
Process
Hypertension
Monday, September 27, 2010
14. QTL Gene ‘Highlighter’
QTL
G G G
AllegroGraph
Disease/Pheno.
GMiner RGD OBO etc
Monday, September 27, 2010
15. RDF/OWL sources
Cell Ontology
http://www.berkeleybop.org/ontologies/obo-all/cell/cell.owl
Mouse Adult Gross Anatomy
http://www.berkeleybop.org/ontologies/obo-all/adult_mouse_anatomy/
adult_mouse_anatomy.owl
Mammalian Phenotype
http://www.berkeleybop.org/ontologies/obo-all/mammalian_phenotype/
mammalian_phenotype.owl
GO Function
http://www.berkeleybop.org/ontologies/obo-all/molecular_function/molecular_function.owl
GO Process
http://www.berkeleybop.org/ontologies/obo-all/biological_process/biological_process.owl
GO component
http://www.berkeleybop.org/ontologies/obo-all/cellular_component/cellular_component.owl
Monday, September 27, 2010
16. Rat Genome Database
Wide variety of data types - genomic and physiological
many with corresponding ontologies
16
Monday, September 27, 2010
21. QTL Highlighter
• Rails source code will be available on GitHub
• RDFizer (ruby) http://github.com/simont/MCW-RDF
Monday, September 27, 2010
22. Next Steps
• Register PURL for RGD
• Create RGD core object ontology (OWL/RDF)
• Select appropriate URIs for RGD data
• Ontology annotations - how best to represent in triple store?
• Export GMiner data to RDF-> Triple Store
• Document & refine biological use cases related to candidate gene selection/evaluation
• Identify additional data required for candidate gene selection, RDFize as appropriate,
load into triple store.
• Connections to other RDF collections/LOD, etc.?
Monday, September 27, 2010