The dark energy paradox leads to a new structure of spacetime.pptx
Kidney and Urinary Pathway Knowledge Base for Data Mining
1. Kidney and Urinary Pathways Knowledge Base
(part of e-LICO)
Simon Jupp
University of Manchester
Bio-ontologies, Boston
July 9 2010
July 9, 2010Bio-ontologies, Boston
2. Kidney and Urinary Knowledge Base and Ontology
KUP KB
(RDF store)
Specialised repository of KUP related data
KUP ontology for integration, query and inference
Background knowledge for data mining experiments
Collaborative update by the community
July 9, 2010Bio-ontologies, Boston
3. Chronic Renal Disease
Obstructive nephropathy
- first cause of end-stage
renal disease in children.
Dialysis or transplantation
- 8000$/patient
A plumbing problem
Kidney
Ureter
Bladder
Urine
July 9, 2010Bio-ontologies, Boston
5. Genome Proteome MetabolomeOR OR
Identification of pathways instead of molecules
July 9, 2010Bio-ontologies, Boston
6. Genome Proteome MetabolomeAND AND
Identification of pathways instead of molecules
!
Identification of nodes in the pathophysiology of obstruction
July 9, 2010Bio-ontologies, Boston
7. e-LICO
Expression data
KUP KB
(RDF store)
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu
July 9, 2010Bio-ontologies, Boston
8. e-LICO
Expression data
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu
KUP KB
(RDF store)
Use Semantic Web technologies (RDF/OWL)
for this part of our infrastructure
July 9, 2010Bio-ontologies, Boston
9. REQUIREMENTS
Need low cost platform for data integration
Flexible data model
– Community extensions
Use of controlled vocabularies
– Ontologies for query and inferencing
KUP KB requirements
July 9, 2010Bio-ontologies, Boston
10. Kidney and Urinary Pathway Knowledge Base
1. Background knowledge to data-mining experiment
2. Repository of KUP experiments
http://www.e-lico.eu/kupkb
-omics data
Experimental data
July 9, 2010Bio-ontologies, Boston
11. KUP KB prototype
Currently contain set of example queries that use the
KUP ontology to query the data:
– Which Human genes have evidence for upregulation in the glomerulus?
– In which tissue is "PLA2G4A" expressed and in which biological processes does
it participate?
– What proteins participate in TGF-beta signaling pathways are where are they
upregulated in the kidney?
July 9, 2010Bio-ontologies, Boston
12. Querying the graph
KUPO Ontology
Entre gene
Gene X GO:0054426
go:biological_process
Gene Y
MA:00345
kupo:002444
PT epithelial cell
rdfs:label
ro:part_of
MA:00456
kupo:004672
DT epithelial cell
rdfs:label
ro:part_of
Higgings Dataset
MA:000345
kupo:expressed_in
Gene Y
MA:00456
kupo:expressed_in
Proximal tubule
Distal tubule
Gene X
Query: What are the genes involved in
Proteins transport expressed in Proximal Tubule Epithelial Cell?
July 9, 2010Bio-ontologies, Boston
13. KUP KB: KUP ontology (alpha)
Anatomy (MAO)Anatomy (MAO) Gene Biological
processes(GO)
Gene Biological
processes(GO)
Cells (CTO)Cells (CTO)
part-of
participate-in
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
Assertion
Inference
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-in
part-of
participates-in
Kidney CortexKidney Cortex
part-of
part-of
Each kidney cell is currently described by its localisation and function
July 9, 2010Bio-ontologies, Boston
14. The KUPO development process
Collaborative
Spreadsheet
Collaborative
Spreadsheet
Individual
Spreadsheet
Individual
Spreadsheet
Issue TrackerIssue Tracker
OPPL
Script
Formulation
OPPL
Script
Formulation
Generate
OWL
Generate
OWL
Reasoned
Ontology
Reasoned
Ontology
View OntologyView Ontology
July 9, 2010Bio-ontologies, Boston
15. KUP KB: –omics data
Asserted relationship
geneid:17638geneid:17638
Entrez
Gene ID
Entrez
Gene ID
type
FaslFasl
symbol
AC18765AC18765
encodes
UNIPROT
ID
UNIPROT
ID
type
We can represent -omics data as a graph
KEGG
pathway
ID
KEGG
pathway
ID
has:00527has:00527
type
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
July 9, 2010Bio-ontologies, Boston
16. KUP KB: experimental data
Asserted relationship
Geneid:17638Geneid:17638
GEO
Experiment ID
GEO
Experiment ID
GEO:028364GEO:028364
type
sample
Differentially
expressed genes
Differentially
expressed genes
KUPO:
Proximal
straight tubule
KUPO:
Proximal
straight tubule
observation
contains
Higgins et alHiggins et al
contributor
We can represent experimental data as a graph
July 9, 2010Bio-ontologies, Boston
17. Connecting the graphs
GEO:028364GEO:028364
sample
Differentially
expressed genes
Differentially
expressed genes
observation
contains
Higgins et alHiggins et al
contributor geneid:17638geneid:17638
FaslFasl
symbol
AC18765AC18765 has:00527has:00527
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-inpart-of
participates-in
July 9, 2010Bio-ontologies, Boston
18. Bio2RDF
Best practices from W3C Health Care and Life Science Working group.
Bio2RDF ontology as a schema
KUP KB
(RDF store)
July 9, 2010Bio-ontologies, Boston
19. So why RDF over RDMS?
Having a standard representation simply makes my life easier
Lots of heterogeneous KUP data to be integrated
RDF allows me to to simply pile more data in
Natural support for ontologies
Although limited
RDF alone isn’t enough
Next step, intelligent agents and crawlers…
How do we harness all this connected data
July 9, 2010Bio-ontologies, Boston
20. Challenges
Bad modelling (?)
– Conflation of instances and classes
Cells bears some function (that is realised in some
process) vs Cell participates in some Process
False statements and vague semantics
– Trying to accommodate the biologists queries
– Mapping natural language to semantic relationships
– Experiments, expression data, gene lists etc.. It’s hard
Plus a whole list of general Semantic Web related issues
July 9, 2010Bio-ontologies, Boston
21. Data mining
Data mining experiments just started
SPARQL query to generate tables for background knowledge to
data mining tools
Mine results for associations, clusters and predictive models.
Build user friendly tools to hide the underlying technology
Results expected Y2 (later this year….)
July 9, 2010Bio-ontologies, Boston
22. Summary
Rapid and low cost data integration
– Thanks to existing community efforts!!
Single SPARQL endpoint provides flexible queries
– Especially useful for our data-mining queries
Rapid ontology development
– Spreadsheets to engage domain experts
July 9, 2010Bio-ontologies, Boston
23. KUP Knowledge Base in e-LICO
KUP KB
(RDF store)
KUP KB
(RDF store)
Bio2RDF
http://www.e-lico.eu/kupkb
E-LICO
Workflows
Use case data
Raw data
E-LICO
DB
E-LICO
DB
E-LICO
Data Analysis
Web interface
Linked Open Data /
Semantic Web /
Bio ontologies
Linked Open Data /
Semantic Web /
Bio ontologies
Query
Results
Shared meta-data
July 9, 2010Bio-ontologies, Boston
24. Julie Klein, Joost Schanstra
– Inserm, France
Robert Stevens
– University of Manchester
EuroKUP members who already contributed to the
ontology
Acknowledgements
July 9, 2010Bio-ontologies, Boston
25. Challenges
KUP KB implemented as triple store (Sesame)
– Scalable
– Limited inference (RDFS)
Experiments with OWL
– Classification possible (Fact++)
– DL Query language lack desirable features
• Joins, Unions, Filters etc..
July 9, 2010Bio-ontologies, Boston
26. Challenges 2
Re-use existing RDF datasets
– Bio2RDF could be improved
– URI guidelines unclear
• PURLs or OBO URI?
Bio-portal, OBO foundry, Bio2RDF….
– RDF endpoint to bio-portal is great!
July 9, 2010Bio-ontologies, Boston
27. Challenges 4
Warehoused data
– I don’t want to maintain other peoples data
Linked data and query federation
– What is possible now?
– SADI framework
July 9, 2010Bio-ontologies, Boston
Notas do Editor
We initially chose a KUP portion of the FMA, but domain experts found that there was too much detail in some sections and not enough in others. In addition, too many ontological distinctions were made within the portion of the FMA and the consequent dispersal of information made it hard to use. In time, we could have refined the FMA to do the job required, but we found that the MAO had all the detail for our needs. Although the connecting tubule is absent in mouse and present in humans, the MAO has this entity. Therefore the MAO can act as a substitute for the human anatomy.
The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases
The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases
The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases