Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ontologies and Continuous Integration
1. Continuous Integration of Open
Biological Ontology Libraries
Chris Mungall
Lawrence Berkeley National
Laboratory
2. Outline
• What is Continuous Integration and why we
need it for ontologies
• A build tool for ontologies: OORT
• Example workflows: GO and HPO
• Lessons learned
3. Reuse and modularization of
ontologies
• Re-use, don’t re-invent
– OBO Foundry
• Modularize
– Ontologies should not be monolithic
standalone entities
– Apply Rector normalization pattern
• Building block approach
– Analogous to software engineering
Rector A. Modularisation of domain ontologies implemented in
http://obofoundry.org Description Logics and related formalisms including OWL. Proceedings of
the 2nd international conference on Knowledge capture (2003)
4. Examples of ontology re-use
• GO is re-using the CHEBI classification of
chemical entities carotenoid
carotenoid
biosynth
– Using GONG* methodology xanthophyll
xanthophyll
biosynth
– Automated classification
• The Human Phenotype (HP) ontology is re-
using FMA classification of anatomical
structures
• GFF3 format re-uses SO for genome feature
types and validation
*Wroe, C. J., Stevens, R., Goble, C. A., & Ashburner, M. (2003). A methodology to migrate the gene ontology to a
description logic environment using DAML+OIL. Pac Symp Biocomput, 624-35.
5. Reuse is not problem-free
• Modules which are tested in one context may not work
in another
– Example: Therac-25 radiation therapy machine fatal errors
– Causes of failure were complex
• Software tested and used on previous models was re-used
– Most software engineers are personally familiar with less
lethal examples
• Lesson:
– Not an excuse to re-implement de-novo
– Integration testing is vital
– This applies to ontologies too
• Inter-ontology integration
• Integration between ontologies and software
systems
6. Integration testing in software
engineering
• Traditional waterfall model
– Integration testing at end
– Deferral = pain
• Agile, test-driven model
– automated Continuous
integration (CI) testing
– Immediate feedback
http://martinfowler.com/articles/continuousIntegration.html
7. Example CI Server Architecture
Developer
Developer
Local
IDE Local
IDE
java
java
update/
commit
Web
CI Web
UI VCS
VCS
UI external
server
code
repository
java perl
production Release
deploy Release
environment
clone CI
Server
8. Jenkins-CI
• A popular extendable open source continuous
integration server
• Easy to set up and administer
• Multiple plugins
• Large helpful user base
• Powerful, clean web based dashboard
• Integrates with most Version Control Systems
(VCSs)
http://jenkins-ci.org/
9. What’s this got to do with ontologies?
Software Engineering Ontology Engineering
Source Code (.java, .pm) Ontology (.owl, .obo)
Version control system Version control system
Builds/Releases Builds/Releases
IDE (Eclipse, Netbeans, …) ODE (Protégé, OBO-Edit)
Bugs ‘true path’ violations, inconsistencies
Junit/Xunit Tests • OWL Logical Axioms
• Structural constraints
• Terminology checks
Build tool (ant, maven) ???
Integration tests ???
Integration server Integration server
10. Oort: A build tool for ontologies
.obo .owl
• What does it do?
– Runs ‘ontology unit tests’ and creates releases obo2owl .gaf
– Logical tests:
• No unsatisfiable classes
• No inferred equivalencies between named classes Oort
– Other tests: OWL
• ≤ 1 textual definition per class API
Reasoner
• ≤ 1 RDFS label per class
• How does it work? verifications
– Built on top of OWL-API
• Most OWL reasoners are available
owl2obo
– GUI
report
• For end-users .obo
report
report
.obo
– Command line
• For use in CI server .owl
.owl
http://code.google.com/p/owltools/wiki/OortIntro
11. Example basic workflow
• Client:
– Make local modifications using
OBO Edit
– Commit changes to SVN
– (optionally) checks dashboard in
web browser
12. Example basic workflow
• Client:
– Make local modifications using
OBO Edit
– Commit changes to SVN
– (optionally) checks dashboard in
web browser
• Server:
– Jenkins polls SVN
– External commit triggers • build-go job:
Jenkins to launch the build- – Load main ontology
go job (using Oort) – Import external disjointness axioms
– Launch hermit
– Write reasoner report
– Fail if unsatisfiable classes found
– Run additional perl checks, ensure external
xrefs resolve, etc
13. Example basic workflow
FAIL
• Write reasoner report SUCCESS
• If previous build was fail, Jenkins
sends ‘service resumed’ email
• Downstream jobs are triggered • Jenkins sends email alert to mail list
• (e.g. bigger integrated builds, • GO editor debugs, fixes then recommits
deployment)
14. OBO Jenkins dashboard
In progress –
Cell ontology (cl)
build
Red ball = FAIL
‘outlook’
http://build.berkeleybop.org/
15. Why we need this for GO
• GO is gradually moving towards leveraging
external ontologies and automated reasoning
– E.g.New metabolism terms come in via TermGenie
• User simply selects CHEBI class
– Automated graph placement (Elk)
CHEBI GO
‘carotenoid biosynthesis’ EquivalentTo
carotenoid
carotenoid
biosynth
biosynthesis and
‘has output’ some carotenoid
xanthophyll
‘xanthophyll biosynthesis’ EquivalentTo
xanthophyll biosynthesis and
biosynth
‘has output’ some xanthophyll
http://go.termgenie.org
16. Why we need this for GO
• Automated quality control using reasoning
– Taxon constraints
– Useful for false function predictions
CHEBI GO NCBITaxon
‘in taxon’ some Metazoa
never in
DisjointWith
carotenoid
carotenoid
Metazoan ‘in taxon’ some Viridiplantae
biosynth
‘carotenoid biosynthesis’ DisjointWith
xanthophyll ‘in taxon’ some Metazoa
xanthophyll
biosynth
Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in
annotation and ontology development. BMC bioinformatics, 11(1), 530. BioMed Central Ltd. doi:10.1186/1471-2105-11-530
17. Errors propagate in an integrated
CHEBI
environment GO MGI NCBITaxon
never in
carotenoid
carotenoid Metazoan
biosynth
xanthophyll
xanthophyll
biosynth Mus
Musculus
X X
in taxon
xanthine Ada (gene)
xanthine
biosynth
Inference:
propagation Ada SubClassOf owl:Nothing
of errors
18. Server-side integration tests are vital
CHEBI GO MGI NCBITaxon
never in
carotenoid
carotenoid Metazoan
biosynth
xanthophyll
xanthophyll
biosynth Mus
Musculus
X X
in taxon
xanthine Ada (gene)
xanthine
biosynth
Inference:
propagation Ada SubClassOf owl:Nothing
of errors
• Problem may not be apparent in developers local
environment
– Manifests when GO is integrated with gene associations
• With CI, errors can be fixed at source
19. Staged builds
• Fowler Principle: ‘Keep the build fast’
• Staged builds
– Balances needs of bug finding and speed
Fastest; Most complete;
Low CPU High CPU
Ontology System
Basic
Integration Integration
Build
Build Build
GO CHEBI Annotations
disjoints Uberon
CL
Taxon
PR
20. User experience
• Previous environment:
– Daily cron job, monolithic perl scripts
• Informal survey results:
– Gene Ontology developers love Jenkins
• Popular Features:
– Transparency of build process
– Direct feedback
– User-friendliness
– ‘build lights’
• Particularly useful for obo/owl hybrid
workflows
21. Human Phenotype Ontology is
deployed using CI
• HPO: ~10k classes
• Logical definitions have dependencies on:
– FMA; PATO; Uberon; GO; CL
• Annotations
– Link OMIM disorders to HPO classes
• Validation
– Oort and GULO
• Uses Hudson rather than Jenkins
Koehler S et al (2008) Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 12(1)
22. CI best practice: use a VCS
• Ontologies are source code
– Always use a version control system to manage your
source code
• Sorry, this is non-negotiable
• CI server integration with VCSs is a great feature
– Polling
– Commit metadata coupled with builds
• Downside of VCSs:
– OWL syntaxes are almost always preferable to obo format,
except
• They suck with VCSs – spurious diffs
• We’re working on a solution
23. Future Enhancements
• Migrate OBO-Edit verification checks to OWL API
• Phase out perl and OBO-Format validation scripts
and move to OWLAPI plus OPPL2 for scripting
• Extend GO validation pipeline to include term
enrichment gold standard sets
– E.g. after ontology change does the p-value of
angiogenesis change in the glioblastoma gene set?
• (Example stolen from Erik Clarke’s talk)
24. Availability
• Oort:
• http://code.google.com/p/owltools/wiki/OortIntro
• OBO build server:
• http://build.berkeleybop.org
• You can request to have your ontology and custom
build pipeline added
– obo-admin@obofoundry.org
• Easy to clone our config and set up your own server
25. Conclusions
• What works for software can work for ontologies
– Ontology engineering should become more like Software
engineering
• Ontology re-use can be hard
– A CI server is vital for staying integrated
• Simple = good
– Admin: Jenkins is easy to set up and maintain
– Users: +1
• Successful for GO, HPO
– Now being extended to other ontologies
– May be a vital component in OBO Foundry infrastructure
• CI will be integral as information systems evolve to
depend more on ontologies
26. Acknowledgments
• Tanya Berardini, Rebecca Foulger, David Hill, Jane
Lomax, Paola Roncaglia, Midori Harris, Ramona
Walls, Laurel Cooper (beta testers)
• Heiko Dietze (Oort)
• Sebastian Bauer (HPO)
• Seth Carbon, Amelia Ireland (Jenkins wrangling)
• GO PIs
• Jenkins
Notas do Editor
Thera-25. Radiation therapy machine. The engineer had reused software from older models. These models had hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, so there was no indication of the existence of faulty software commands.
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly
Client/server
Client/server
Magic 10 minutesExample:Commit build:Use EL reasoner over main ontology, plus external disjointness axioms. Basic structural checksCan be executed painlessly in client ODEFast, immediate feedback, BUT doesn’t catch all issuesIntegrated ontology buildBring in external ontologiesIntegrated system buildCheck consequences of commits against all gene associations
User friendliness to the point of anthropomorhpization