The document discusses the need for ontologies in biology to integrate data from the large number of biological databases and standards. It outlines tools for building and using ontologies, including those for end users to search and analyze data, and those for ontology engineers to develop ontologies through automated reasoning and integration. The Gene Ontology is provided as an example of an ontology that has been widely adopted for analyzing gene sets. The document advocates developing ontologies through a collaborative framework like the Open Biological and Biomedical Ontologies to promote reuse and integration across domains.
Experiences in the biosciences with the open biological ontologies foundry and the gene ontology
1. Experiences in the biosciences with the Open
Biological Ontologies Foundry and the Gene
Ontology
Chris Mungall
Berkeley Lab
@chrismungall
cjmungall@lbl.gov
OMDI October 2021
2. Background: Need for ontologies in the life sciences
>1800 Databases
>1500 Standards
>900 Ontologies
~13.5m terms
220m proteins
65bn genes
3. Outline: Tools for ontologies
(1) Tools for end-users (2) Tools for engineering ontologies
owl:Ontology
Search
Integrate Data
Analyze Data
Knowledge Acquisition
Automated classification /
verification
Versioning and release
management
Ontology integration
Domain-specific:
many
GO
CHEBI
use build
4. Ontologies should be built with science uses in mind
As a biologist,
I want to interpret a set of over-expressed genes,
so that I can understand the cellular mechanisms
underlying an experimental perturbation
Drought
Activated
genes
5. Ontologies should be built with science uses in mind
As a biologist,
I want to interpret a set of over-expressed genes,
so that I can understand the cellular mechanisms
underlying an experimental perturbation
GO:0009738
abscisic acid-activated
signaling pathway
GO:0015979
photosynthesis
Drought GO
Activated
genes
Gene Ontology Analysis
Input: list of genes
Background knowledge:
1. Associations of genes to GO
classes
2. Hierarchical relationships
between GO classes
Statistical test:
Which GO classes are over-
represented in input gene set
compared to background
Output:
Ranked list of GO classes that
characterize input gene set
6. Ontologies should be built with science uses in mind
As a biologist,
I want to interpret a set of over-expressed genes,
so that I can understand the cellular mechanisms
underlying an experimental perturbation
As an ontologist, I want to build the
perfect representation of the world,
... because I can
GO:0009738
abscisic acid-activated
signaling pathway
GO:0015979
photosynthesis
Drought GO
Activated
genes
7. Build links with data science community
Bioconductor GO.db package
- Up to 50k downloads/mo
8. Reuse existing generic tools and browsers
https://www.ebi.ac.uk/ols
● EBI instances has most widely used
ontologies
● Easy to Dockerize and run your own
instance
● Works with any OWL ontologies
11. Tools for building ontologies
(2) Tools for engineering ontologies
owl:Ontology
Knowledge Acquisition
Automated classification /
verification
Versioning and release
management
Ontology integration
GO
CHEBI
build
12. The original bio-ontologies were SKOS-like silos
glucan biosynthesis
(GO:0009250)
polysaccharide biosynthesis
(GO:0000271)
is_a
glucan
(CHEBI:37163)
polysaccharide
(CHEBI:18154)
is_a
GO:
Biological
Process
CHEBI:
Chemical
Entity
No reuse or
connection
13. Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to describe all
of the things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
16. Challenge: Scaling up
Core ontology
developer
Domain expert
Hey, domain
expert, can you
help with my
ontology?
I’d love to, but I
can’t figure out this
OWL thing. Can I
give you a
spreadsheet?
17. Templated OWL Design Patterns
https://robot.obolibrary.org/template
https://github.com/INCATools/dead_simple_owl_design_patterns/
‘Biosynthesis’ and has-
output some ___chemical entity
and via-intermediate some
___chemical entity and
Template
dct:creator
dct:contributor
OWL
expert
Ontology developer
Domain expert
dct:contributor
Ontology
developer
Biocurator
ROBOT +
Elk
18. Ontology Users
Ontology
Developer
s
OWL
experts
● Author OWL templates
● Create Design Patterns
● Implement OWL templates
● Test against Design Patterns
● Consume pre-
reasoned hierarchies
Leverage the Expertise Pyramid
Learning
23. Linked Data Modeling Language
JSON-Schema
ShEx
JSON-LD
Contexts
Python
Dataclasses
OWL
https://linkml.io
https://github.com/linkml/linkml
Semantic Web
Applications
And
Infrastructure
“Traditional”
Applications and
Infrastructure
SQL DDL
Create datamodels in simple YAML files,
optionally annotated using ontologies
Compile to other
frameworks
Choose the right tools
for the job, no lock in
Biocurator
Data
Scientist
dct:creator
24. Conclusions
Tools for ontology users
Build ontologies to solve problems
Reuse existing generic tools
Team with data scientists to build
domain-specific tools
Tools for engineering ontologies
Use OWL and reasoners; but hide the
complexity
Treat ontology development like modern
open software development
Use the appropriate formalisms
● OWL for terminological ontologies
● LinkML or shape languages for
semantic schemas
25. Acknowledgments
OBO Operations
● Mathias Brochhausen
● Pier Luigi Buttigieg
● Melanie Courtot
● Alexander Diehl
● Melissa Haendel
● Simon Jupp
● Nomi Harris
● James Malone
● Darren Natale
● Jim Balhoff
● David Osumi-Sutherland
● Philippe Rocca-Serra
● Asiyah Lin
● Damion Dooley
● Alan Ruttenberg
● Richard Scheuermann
● Lynn Schriml
● Barry Smith
● Chris Stoeckert
● Nicole Vasilevsky
● Ramona Walls
● Xiaolin Yang
● Jie Zheng
OBO Services Team
● James Overon
● Becky Jackson
● Nico Matentzoglu
● Seth Carbon
● Mark Miller
● Deepak Unni
● Nomi Harris
● Bill Duncan
● Randi Vita
● Bjoern Peters
We’re hiring!!!
Knowledge Graphs
● Justin Reese
● Marcin Joachimiak
● Bill Duncan
● Seth Carbon
● Harshad Hegde
● Harry Caufield
● Sierra Moxon
● Elena Casiraghi
● Luca Cappelletti
● Giorgio Valentini
● Tommaso Fontana
● Tiffany Callahan
● Kent Shefchek
● Kevin Schafer
● Nomi Harris
● Moni Muñoz-Torres
● Peter Robinson
GO/Reactome/Rhea
● Peter d’Eustachio
● Harold Drabkin
● Jim Balhoff
● Ben Good
● Huaiyu Mi
● David Hill
● Kimberly van Auken
● Pascale Gaudet
● Laurent-Philippe Albou
● Anne Morgat
● Alan Bridge
● Paul Thomas
LinkML
● Harold Solbrig
● Dazhi Zhao
● Joe Flack
● Gaurav Vaidya
● Tim Putman
● Donny Winston
● Bill Duncan
● Mark Miller
● Sujay Patil
● Shahim Essaid
● Matt Brush
● Brian Furner
● Sierra Moxon Patterns
● Sue Bello
● Nicole Vasilevsky
● All the MOD + HPO curators
● Nico Matentzoglu
BioLink
● Sierra Moxon
● Mike Bada
● Deepak Unni
● Michel Dumontier
● Vlado Dancik
● Matt Brush
● NCATS Translator DM
team
Editor's Notes
Comparison: Amazon has 600m things.
Comparison: Amazon has 600m things.
Comparison: Amazon has 600m things.
Comparison: Amazon has 600m things.
Comparison: Amazon has 600m things.
Comparison: Amazon has 600m things.
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., Drabkin, H. J., Ennis, M., Foulger, R. E., Harris, M. A., Hastings, J., Kale, N. S., de Matos, P., Mungall, C. J., Owen, G., Roncaglia, P., Steinbeck, C., Turner, S., and Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC genomics, 14(1):513
Mungall, C. J., Dietze, H., & Osumi-Sutherland, D. (2014). Use of OWL within the Gene Ontology. In M. Keet & V. Tamma (Eds.), Proceedings of the 11th International Workshop on OWL: Experiences and Directions (OWLED 2014) (pp. 25–36). Riva del Garda, Italy, October 17-18, 2014. doi:10.1101/010090