SlideShare uma empresa Scribd logo
1 de 26
Continuous Integration of Open
 Biological Ontology Libraries
           Chris Mungall
     Lawrence Berkeley National
            Laboratory
Outline
• What is Continuous Integration and why we
  need it for ontologies
• A build tool for ontologies: OORT
• Example workflows: GO and HPO
• Lessons learned
Reuse and modularization of
                    ontologies
  • Re-use, don’t re-invent
      – OBO Foundry
  • Modularize
      – Ontologies should not be monolithic
        standalone entities
      – Apply Rector normalization pattern
           • Building block approach
      – Analogous to software engineering

                               Rector A. Modularisation of domain ontologies implemented in
http://obofoundry.org          Description Logics and related formalisms including OWL. Proceedings of
                               the 2nd international conference on Knowledge capture (2003)
Examples of ontology re-use
     • GO is re-using the CHEBI classification of
       chemical entities                                                         carotenoid
                                                                                                    carotenoid
                                                                                                     biosynth


           – Using GONG* methodology                                             xanthophyll
                                                                                                    xanthophyll
                                                                                                     biosynth

           – Automated classification
     • The Human Phenotype (HP) ontology is re-
       using FMA classification of anatomical
       structures
     • GFF3 format re-uses SO for genome feature
       types and validation
*Wroe, C. J., Stevens, R., Goble, C. A., & Ashburner, M. (2003). A methodology to migrate the gene ontology to a
description logic environment using DAML+OIL. Pac Symp Biocomput, 624-35.
Reuse is not problem-free
• Modules which are tested in one context may not work
  in another
   – Example: Therac-25 radiation therapy machine fatal errors
   – Causes of failure were complex
      • Software tested and used on previous models was re-used
   – Most software engineers are personally familiar with less
     lethal examples

                   • Lesson:
                       – Not an excuse to re-implement de-novo
                       – Integration testing is vital
                       – This applies to ontologies too
                           • Inter-ontology integration
                           • Integration between ontologies and software
                             systems
Integration testing in software
              engineering
• Traditional waterfall model
  – Integration testing at end
  – Deferral = pain


                • Agile, test-driven model
                    – automated Continuous
                      integration (CI) testing
                    – Immediate feedback
               http://martinfowler.com/articles/continuousIntegration.html
Example CI Server Architecture
                Developer

                                                     Developer
                     Local
                      IDE                            Local
                                                      IDE
              java

                                              java

                             update/
                             commit
  Web
   CI   Web
   UI   VCS
                         VCS
         UI                                           external
                        server
                                                       code
                                                     repository


                            java       perl
         production                                         Release
                                                deploy       Release
        environment
           clone          CI
                        Server
Jenkins-CI
• A popular extendable open source continuous
  integration server
• Easy to set up and administer
• Multiple plugins
• Large helpful user base
• Powerful, clean web based dashboard
• Integrates with most Version Control Systems
  (VCSs)
                                    http://jenkins-ci.org/
What’s this got to do with ontologies?
Software Engineering         Ontology Engineering


Source Code (.java, .pm)     Ontology (.owl, .obo)
Version control system       Version control system
Builds/Releases              Builds/Releases
IDE (Eclipse, Netbeans, …)   ODE (Protégé, OBO-Edit)
Bugs                         ‘true path’ violations, inconsistencies
Junit/Xunit Tests            • OWL Logical Axioms
                             • Structural constraints
                             • Terminology checks
Build tool (ant, maven)      ???
Integration tests            ???
Integration server           Integration server
Oort: A build tool for ontologies
                                                                 .obo         .owl
   • What does it do?
       – Runs ‘ontology unit tests’ and creates releases        obo2owl                   .gaf

       – Logical tests:
            • No unsatisfiable classes
            • No inferred equivalencies between named classes                Oort

       – Other tests:                                                     OWL
            • ≤ 1 textual definition per class                            API
                                                                                       Reasoner

            • ≤ 1 RDFS label per class
   • How does it work?                                                     verifications
       – Built on top of OWL-API
            • Most OWL reasoners are available
                                                                 owl2obo
       – GUI
                                                                                           report
            • For end-users                                       .obo
                                                                                             report
                                                                                               report
                                                                    .obo
       – Command line
            • For use in CI server                                            .owl
                                                                                .owl


http://code.google.com/p/owltools/wiki/OortIntro
Example basic workflow
• Client:
   – Make local modifications using
     OBO Edit
   – Commit changes to SVN
   – (optionally) checks dashboard in
     web browser
Example basic workflow
• Client:
   – Make local modifications using
     OBO Edit
   – Commit changes to SVN
   – (optionally) checks dashboard in
     web browser


• Server:
   – Jenkins polls SVN
   – External commit triggers     •     build-go job:
     Jenkins to launch the build-        –   Load main ontology
     go job (using Oort)                 –   Import external disjointness axioms
                                         –   Launch hermit
                                         –   Write reasoner report
                                         –   Fail if unsatisfiable classes found
                                         –   Run additional perl checks, ensure external
                                             xrefs resolve, etc
Example basic workflow




                                                                 FAIL
• Write reasoner report                  SUCCESS
• If previous build was fail, Jenkins
  sends ‘service resumed’ email
• Downstream jobs are triggered           • Jenkins sends email alert to mail list
     • (e.g. bigger integrated builds,    • GO editor debugs, fixes then recommits
        deployment)
OBO Jenkins dashboard




In progress –
Cell ontology (cl)
 build

        Red ball = FAIL
                          ‘outlook’
                                      http://build.berkeleybop.org/
Why we need this for GO
    • GO is gradually moving towards leveraging
      external ontologies and automated reasoning
         – E.g.New metabolism terms come in via TermGenie
              • User simply selects CHEBI class
         – Automated graph placement (Elk)
              CHEBI            GO



                                            ‘carotenoid biosynthesis’ EquivalentTo
                               carotenoid
              carotenoid
                                biosynth
                                             biosynthesis and
                                              ‘has output’ some carotenoid


                              xanthophyll
                                              ‘xanthophyll biosynthesis’ EquivalentTo
             xanthophyll                       biosynthesis and
                               biosynth
                                                ‘has output’ some xanthophyll

http://go.termgenie.org
Why we need this for GO
   • Automated quality control using reasoning
          – Taxon constraints
          – Useful for false function predictions
  CHEBI                         GO                                       NCBITaxon

                                                                                           ‘in taxon’ some Metazoa
                                                    never in
                                                                                              DisjointWith
  carotenoid
                               carotenoid
                                                                            Metazoan        ‘in taxon’ some Viridiplantae
                                biosynth


                                                        ‘carotenoid biosynthesis’ DisjointWith
                               xanthophyll               ‘in taxon’ some Metazoa
  xanthophyll
                                biosynth




Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in
annotation and ontology development. BMC bioinformatics, 11(1), 530. BioMed Central Ltd. doi:10.1186/1471-2105-11-530
Errors propagate in an integrated
CHEBI
              environment   GO               MGI          NCBITaxon



                                          never in
                            carotenoid
carotenoid                                                   Metazoan
                             biosynth



                            xanthophyll
xanthophyll
                             biosynth                          Mus
                                                             Musculus
   X                           X
                                                        in taxon
                             xanthine      Ada (gene)
 xanthine
                             biosynth
                                                        Inference:
              propagation                                    Ada SubClassOf owl:Nothing
                of errors
Server-side integration tests are vital
 CHEBI                      GO               MGI          NCBITaxon



                                          never in
                            carotenoid
carotenoid                                                   Metazoan
                             biosynth



                            xanthophyll
xanthophyll
                             biosynth                          Mus
                                                             Musculus
   X                           X
                                                        in taxon
                             xanthine      Ada (gene)
 xanthine
                             biosynth
                                                        Inference:
              propagation                                    Ada SubClassOf owl:Nothing
                of errors


• Problem may not be apparent in developers local
  environment
       – Manifests when GO is integrated with gene associations
• With CI, errors can be fixed at source
Staged builds
• Fowler Principle: ‘Keep the build fast’
• Staged builds
  – Balances needs of bug finding and speed
      Fastest;                 Most complete;
     Low CPU                     High CPU

                  Ontology         System
       Basic
                 Integration     Integration
       Build
                    Build           Build



       GO        CHEBI         Annotations


     disjoints   Uberon

                  CL
      Taxon

                  PR
User experience
• Previous environment:
  – Daily cron job, monolithic perl scripts
• Informal survey results:
  – Gene Ontology developers love Jenkins
• Popular Features:
  –   Transparency of build process
  –   Direct feedback
  –   User-friendliness
  –   ‘build lights’
• Particularly useful for obo/owl hybrid
  workflows
Human Phenotype Ontology is
                         deployed using CI
     • HPO: ~10k classes
     • Logical definitions have dependencies on:
             – FMA; PATO; Uberon; GO; CL
     • Annotations
             – Link OMIM disorders to HPO classes
     • Validation
             – Oort and GULO
     • Uses Hudson rather than Jenkins
Koehler S et al (2008) Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 12(1)
CI best practice: use a VCS
• Ontologies are source code
   – Always use a version control system to manage your
     source code
      • Sorry, this is non-negotiable
• CI server integration with VCSs is a great feature
   – Polling
   – Commit metadata coupled with builds
• Downside of VCSs:
   – OWL syntaxes are almost always preferable to obo format,
     except
      • They suck with VCSs – spurious diffs
      • We’re working on a solution
Future Enhancements

• Migrate OBO-Edit verification checks to OWL API
• Phase out perl and OBO-Format validation scripts
  and move to OWLAPI plus OPPL2 for scripting
• Extend GO validation pipeline to include term
  enrichment gold standard sets
  – E.g. after ontology change does the p-value of
    angiogenesis change in the glioblastoma gene set?
     • (Example stolen from Erik Clarke’s talk)
Availability
• Oort:
     • http://code.google.com/p/owltools/wiki/OortIntro
• OBO build server:
     • http://build.berkeleybop.org
     • You can request to have your ontology and custom
       build pipeline added
          – obo-admin@obofoundry.org
     • Easy to clone our config and set up your own server
Conclusions
• What works for software can work for ontologies
   – Ontology engineering should become more like Software
     engineering
• Ontology re-use can be hard
   – A CI server is vital for staying integrated
• Simple = good
   – Admin: Jenkins is easy to set up and maintain
   – Users: +1
• Successful for GO, HPO
   – Now being extended to other ontologies
   – May be a vital component in OBO Foundry infrastructure
• CI will be integral as information systems evolve to
  depend more on ontologies
Acknowledgments
• Tanya Berardini, Rebecca Foulger, David Hill, Jane
  Lomax, Paola Roncaglia, Midori Harris, Ramona
  Walls, Laurel Cooper (beta testers)
• Heiko Dietze (Oort)
• Sebastian Bauer (HPO)
• Seth Carbon, Amelia Ireland (Jenkins wrangling)
• GO PIs
• Jenkins

Mais conteúdo relacionado

Semelhante a Ontologies and Continuous Integration

Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...Daryl Walleck
 
Visual COBOL Development for Unix and Java
Visual COBOL Development for Unix and JavaVisual COBOL Development for Unix and Java
Visual COBOL Development for Unix and JavaMicro Focus
 
WORKS 11 Presentation
WORKS 11 PresentationWORKS 11 Presentation
WORKS 11 Presentationdgarijo
 
Open.source.innovation.20070624
Open.source.innovation.20070624Open.source.innovation.20070624
Open.source.innovation.20070624Vu Hung Nguyen
 
Writing Well Abstracted Automation on Foundations of Jello
Writing Well Abstracted Automation on Foundations of JelloWriting Well Abstracted Automation on Foundations of Jello
Writing Well Abstracted Automation on Foundations of JelloDan Cuellar
 
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefShomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefKate Carcelen
 
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefShomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefChef
 
EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012Istvan Rath
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for NetworkDamien Garros
 
Weaving aspects in PHP with the help of Go! AOP library
Weaving aspects in PHP with the help of Go! AOP libraryWeaving aspects in PHP with the help of Go! AOP library
Weaving aspects in PHP with the help of Go! AOP libraryAlexander Lisachenko
 
Introduction to Robot Framework – Exove
Introduction to Robot Framework – ExoveIntroduction to Robot Framework – Exove
Introduction to Robot Framework – ExoveExove
 
A multiplatform Java wrapper for the BioAPI framework
A multiplatform Java wrapper for the BioAPI frameworkA multiplatform Java wrapper for the BioAPI framework
A multiplatform Java wrapper for the BioAPI frameworkNidhi Baranwal
 
Colony, modularity the easy way
Colony, modularity the easy wayColony, modularity the easy way
Colony, modularity the easy wayHive Solutions
 
Topic production code
Topic production codeTopic production code
Topic production codeKavi Kumar
 
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...Lucas Sacramento
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
 
Behaviour testing for single-page applications and API’s
Behaviour testing for single-page applications and API’sBehaviour testing for single-page applications and API’s
Behaviour testing for single-page applications and API’sAndrew Kirkpatrick
 
Process Matters (Cloud2Days / Java2Days conference))
Process Matters (Cloud2Days / Java2Days conference))Process Matters (Cloud2Days / Java2Days conference))
Process Matters (Cloud2Days / Java2Days conference))dev2ops
 
Next Generation Architecture Showcase July 2019
Next Generation Architecture Showcase July 2019Next Generation Architecture Showcase July 2019
Next Generation Architecture Showcase July 2019Alan Pearson Mathews
 

Semelhante a Ontologies and Continuous Integration (20)

Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
 
Visual COBOL Development for Unix and Java
Visual COBOL Development for Unix and JavaVisual COBOL Development for Unix and Java
Visual COBOL Development for Unix and Java
 
WORKS 11 Presentation
WORKS 11 PresentationWORKS 11 Presentation
WORKS 11 Presentation
 
Open.source.innovation.20070624
Open.source.innovation.20070624Open.source.innovation.20070624
Open.source.innovation.20070624
 
Writing Well Abstracted Automation on Foundations of Jello
Writing Well Abstracted Automation on Foundations of JelloWriting Well Abstracted Automation on Foundations of Jello
Writing Well Abstracted Automation on Foundations of Jello
 
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefShomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
 
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing ChefShomi's Journey to Chef: Lessons Learned on Implementing Chef
Shomi's Journey to Chef: Lessons Learned on Implementing Chef
 
EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012EMF-IncQuery presentation at TOOLS 2012
EMF-IncQuery presentation at TOOLS 2012
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for Network
 
Weaving aspects in PHP with the help of Go! AOP library
Weaving aspects in PHP with the help of Go! AOP libraryWeaving aspects in PHP with the help of Go! AOP library
Weaving aspects in PHP with the help of Go! AOP library
 
Introduction to Robot Framework – Exove
Introduction to Robot Framework – ExoveIntroduction to Robot Framework – Exove
Introduction to Robot Framework – Exove
 
A multiplatform Java wrapper for the BioAPI framework
A multiplatform Java wrapper for the BioAPI frameworkA multiplatform Java wrapper for the BioAPI framework
A multiplatform Java wrapper for the BioAPI framework
 
Colony, modularity the easy way
Colony, modularity the easy wayColony, modularity the easy way
Colony, modularity the easy way
 
Topic production code
Topic production codeTopic production code
Topic production code
 
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...
CarTrawler's Feature Team Architecture and Development Process Showcase by Lu...
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Behaviour testing for single-page applications and API’s
Behaviour testing for single-page applications and API’sBehaviour testing for single-page applications and API’s
Behaviour testing for single-page applications and API’s
 
Process Matters (Cloud2Days / Java2Days conference))
Process Matters (Cloud2Days / Java2Days conference))Process Matters (Cloud2Days / Java2Days conference))
Process Matters (Cloud2Days / Java2Days conference))
 
Next Generation Architecture Showcase July 2019
Next Generation Architecture Showcase July 2019Next Generation Architecture Showcase July 2019
Next Generation Architecture Showcase July 2019
 

Mais de Chris Mungall

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesChris Mungall
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)Chris Mungall
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupChris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributionsChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodelChris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 

Mais de Chris Mungall (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodel
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Ontologies and Continuous Integration

  • 1. Continuous Integration of Open Biological Ontology Libraries Chris Mungall Lawrence Berkeley National Laboratory
  • 2. Outline • What is Continuous Integration and why we need it for ontologies • A build tool for ontologies: OORT • Example workflows: GO and HPO • Lessons learned
  • 3. Reuse and modularization of ontologies • Re-use, don’t re-invent – OBO Foundry • Modularize – Ontologies should not be monolithic standalone entities – Apply Rector normalization pattern • Building block approach – Analogous to software engineering Rector A. Modularisation of domain ontologies implemented in http://obofoundry.org Description Logics and related formalisms including OWL. Proceedings of the 2nd international conference on Knowledge capture (2003)
  • 4. Examples of ontology re-use • GO is re-using the CHEBI classification of chemical entities carotenoid carotenoid biosynth – Using GONG* methodology xanthophyll xanthophyll biosynth – Automated classification • The Human Phenotype (HP) ontology is re- using FMA classification of anatomical structures • GFF3 format re-uses SO for genome feature types and validation *Wroe, C. J., Stevens, R., Goble, C. A., & Ashburner, M. (2003). A methodology to migrate the gene ontology to a description logic environment using DAML+OIL. Pac Symp Biocomput, 624-35.
  • 5. Reuse is not problem-free • Modules which are tested in one context may not work in another – Example: Therac-25 radiation therapy machine fatal errors – Causes of failure were complex • Software tested and used on previous models was re-used – Most software engineers are personally familiar with less lethal examples • Lesson: – Not an excuse to re-implement de-novo – Integration testing is vital – This applies to ontologies too • Inter-ontology integration • Integration between ontologies and software systems
  • 6. Integration testing in software engineering • Traditional waterfall model – Integration testing at end – Deferral = pain • Agile, test-driven model – automated Continuous integration (CI) testing – Immediate feedback http://martinfowler.com/articles/continuousIntegration.html
  • 7. Example CI Server Architecture Developer Developer Local IDE Local IDE java java update/ commit Web CI Web UI VCS VCS UI external server code repository java perl production Release deploy Release environment clone CI Server
  • 8. Jenkins-CI • A popular extendable open source continuous integration server • Easy to set up and administer • Multiple plugins • Large helpful user base • Powerful, clean web based dashboard • Integrates with most Version Control Systems (VCSs) http://jenkins-ci.org/
  • 9. What’s this got to do with ontologies? Software Engineering Ontology Engineering Source Code (.java, .pm) Ontology (.owl, .obo) Version control system Version control system Builds/Releases Builds/Releases IDE (Eclipse, Netbeans, …) ODE (Protégé, OBO-Edit) Bugs ‘true path’ violations, inconsistencies Junit/Xunit Tests • OWL Logical Axioms • Structural constraints • Terminology checks Build tool (ant, maven) ??? Integration tests ??? Integration server Integration server
  • 10. Oort: A build tool for ontologies .obo .owl • What does it do? – Runs ‘ontology unit tests’ and creates releases obo2owl .gaf – Logical tests: • No unsatisfiable classes • No inferred equivalencies between named classes Oort – Other tests: OWL • ≤ 1 textual definition per class API Reasoner • ≤ 1 RDFS label per class • How does it work? verifications – Built on top of OWL-API • Most OWL reasoners are available owl2obo – GUI report • For end-users .obo report report .obo – Command line • For use in CI server .owl .owl http://code.google.com/p/owltools/wiki/OortIntro
  • 11. Example basic workflow • Client: – Make local modifications using OBO Edit – Commit changes to SVN – (optionally) checks dashboard in web browser
  • 12. Example basic workflow • Client: – Make local modifications using OBO Edit – Commit changes to SVN – (optionally) checks dashboard in web browser • Server: – Jenkins polls SVN – External commit triggers • build-go job: Jenkins to launch the build- – Load main ontology go job (using Oort) – Import external disjointness axioms – Launch hermit – Write reasoner report – Fail if unsatisfiable classes found – Run additional perl checks, ensure external xrefs resolve, etc
  • 13. Example basic workflow FAIL • Write reasoner report SUCCESS • If previous build was fail, Jenkins sends ‘service resumed’ email • Downstream jobs are triggered • Jenkins sends email alert to mail list • (e.g. bigger integrated builds, • GO editor debugs, fixes then recommits deployment)
  • 14. OBO Jenkins dashboard In progress – Cell ontology (cl) build Red ball = FAIL ‘outlook’ http://build.berkeleybop.org/
  • 15. Why we need this for GO • GO is gradually moving towards leveraging external ontologies and automated reasoning – E.g.New metabolism terms come in via TermGenie • User simply selects CHEBI class – Automated graph placement (Elk) CHEBI GO ‘carotenoid biosynthesis’ EquivalentTo carotenoid carotenoid biosynth biosynthesis and ‘has output’ some carotenoid xanthophyll ‘xanthophyll biosynthesis’ EquivalentTo xanthophyll biosynthesis and biosynth ‘has output’ some xanthophyll http://go.termgenie.org
  • 16. Why we need this for GO • Automated quality control using reasoning – Taxon constraints – Useful for false function predictions CHEBI GO NCBITaxon ‘in taxon’ some Metazoa never in DisjointWith carotenoid carotenoid Metazoan ‘in taxon’ some Viridiplantae biosynth ‘carotenoid biosynthesis’ DisjointWith xanthophyll ‘in taxon’ some Metazoa xanthophyll biosynth Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC bioinformatics, 11(1), 530. BioMed Central Ltd. doi:10.1186/1471-2105-11-530
  • 17. Errors propagate in an integrated CHEBI environment GO MGI NCBITaxon never in carotenoid carotenoid Metazoan biosynth xanthophyll xanthophyll biosynth Mus Musculus X X in taxon xanthine Ada (gene) xanthine biosynth Inference: propagation Ada SubClassOf owl:Nothing of errors
  • 18. Server-side integration tests are vital CHEBI GO MGI NCBITaxon never in carotenoid carotenoid Metazoan biosynth xanthophyll xanthophyll biosynth Mus Musculus X X in taxon xanthine Ada (gene) xanthine biosynth Inference: propagation Ada SubClassOf owl:Nothing of errors • Problem may not be apparent in developers local environment – Manifests when GO is integrated with gene associations • With CI, errors can be fixed at source
  • 19. Staged builds • Fowler Principle: ‘Keep the build fast’ • Staged builds – Balances needs of bug finding and speed Fastest; Most complete; Low CPU High CPU Ontology System Basic Integration Integration Build Build Build GO CHEBI Annotations disjoints Uberon CL Taxon PR
  • 20. User experience • Previous environment: – Daily cron job, monolithic perl scripts • Informal survey results: – Gene Ontology developers love Jenkins • Popular Features: – Transparency of build process – Direct feedback – User-friendliness – ‘build lights’ • Particularly useful for obo/owl hybrid workflows
  • 21. Human Phenotype Ontology is deployed using CI • HPO: ~10k classes • Logical definitions have dependencies on: – FMA; PATO; Uberon; GO; CL • Annotations – Link OMIM disorders to HPO classes • Validation – Oort and GULO • Uses Hudson rather than Jenkins Koehler S et al (2008) Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 12(1)
  • 22. CI best practice: use a VCS • Ontologies are source code – Always use a version control system to manage your source code • Sorry, this is non-negotiable • CI server integration with VCSs is a great feature – Polling – Commit metadata coupled with builds • Downside of VCSs: – OWL syntaxes are almost always preferable to obo format, except • They suck with VCSs – spurious diffs • We’re working on a solution
  • 23. Future Enhancements • Migrate OBO-Edit verification checks to OWL API • Phase out perl and OBO-Format validation scripts and move to OWLAPI plus OPPL2 for scripting • Extend GO validation pipeline to include term enrichment gold standard sets – E.g. after ontology change does the p-value of angiogenesis change in the glioblastoma gene set? • (Example stolen from Erik Clarke’s talk)
  • 24. Availability • Oort: • http://code.google.com/p/owltools/wiki/OortIntro • OBO build server: • http://build.berkeleybop.org • You can request to have your ontology and custom build pipeline added – obo-admin@obofoundry.org • Easy to clone our config and set up your own server
  • 25. Conclusions • What works for software can work for ontologies – Ontology engineering should become more like Software engineering • Ontology re-use can be hard – A CI server is vital for staying integrated • Simple = good – Admin: Jenkins is easy to set up and maintain – Users: +1 • Successful for GO, HPO – Now being extended to other ontologies – May be a vital component in OBO Foundry infrastructure • CI will be integral as information systems evolve to depend more on ontologies
  • 26. Acknowledgments • Tanya Berardini, Rebecca Foulger, David Hill, Jane Lomax, Paola Roncaglia, Midori Harris, Ramona Walls, Laurel Cooper (beta testers) • Heiko Dietze (Oort) • Sebastian Bauer (HPO) • Seth Carbon, Amelia Ireland (Jenkins wrangling) • GO PIs • Jenkins

Notas do Editor

  1. Thera-25. Radiation therapy machine. The engineer had reused software from older models. These models had hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, so there was no indication of the existence of faulty software commands.
  2. Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly
  3. Client/server
  4. Client/server
  5. Magic 10 minutesExample:Commit build:Use EL reasoner over main ontology, plus external disjointness axioms. Basic structural checksCan be executed painlessly in client ODEFast, immediate feedback, BUT doesn’t catch all issuesIntegrated ontology buildBring in external ontologiesIntegrated system buildCheck consequences of commits against all gene associations
  6. User friendliness to the point of anthropomorhpization
  7. See Erik Clarke’s talk
  8. Recommended for anyone who uses ontologies
  9. Integration, once attained, is easily lost