Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community
1. Continued development of ChEBI towards
better usability for the systems biology and
metabolic modelling community
1Neil Swainston, 2Janna Hastings, 1Pedro Mendes, 2Christoph Steinbeck
1Manchester Centre for Integrative Systems Biology, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7ND, United Kingdom
2The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Introduction
• Genome-scale metabolic reconstructions require integrated computable
knowledge about biochemical entities, such as macromolecules and
metabolites.
• The use of unambiguous, semantically typed, publicly available and perennial
identifiers for model components is essential if systems biology models are to
be shared, reused and developed by communities, maximising their benefit1.
• Such model annotation allows chem- and bioinformatics data, such as chemical
structures and protein sequences, to extracted from resources via web service
interfaces, greatly enhancing the usability and range of applications.
• ChEBI2 is a freely available biochemical database and ontology, housed at the
European Bioinformatics Institute (EBI), is becoming increasingly used in both
modelling and metabolomics. http://www.ebi.ac.uk/chebi/
• The BioModels Database currently contains over 110000 models that use ChEBI
identifiers. Improvements to and extensions of the ChEBI resource will have an
immediate benefit to the systems biology modelling community.
• ChEBI will be extended for systems biology by implementing the following
improvements:
• A comprehensive cross-platform API library for accessing ChEBI
programmatically will be developed. The API, libChEBI, will be made publicly
available as an open source library in Java and Python.
• It will include facilities such as extracting biologically relevant groups of
compounds (such as tautomers), calculating additional physicochemical
properties and semantic reasoning over model annotations.
• The ChEBI database will be extended to enhance stability, increase
community involvement and provide a new powerful visualisation for the
biological context of molecular entities.
• To extend the scope of ChEBI, all known metabolites across human, mouse,
E. coli and S. cerevisiae will be incorporated and curated into the resource.
References
1. A consensus yeast metabolic network reconstruction obtained from a
community approach to systems biology. Herrgård MJ, et al. Nat Biotechnol.
2008, 26, 1155-60.
2. ChEBI: a database and ontology for chemical entities of biological interest.
Degtyarenko K, et al. Nucleic Acids Res. 2008, 36, D344–D350.
3. MetaboLights -- an open-access general-purpose repository for metabolomics
studies and associated meta-data. Haug K, et al. Nucleic Acids Res. 2013, 41,
D781-6.
4. The Chemistry Development Kit (CDK): an open-source Java library for Chemo-
and Bioinformatics. Steinbeck C, et al. J Chem Inf Comput Sci. 2003, 43, 493-500.
libChEBI
• Programmatic access to ChEBI will be supported through the introduction of
libChEBI, a cross-platform, open source API in Java and Python.
• libChEBI will provide a link between models, modelling software and
metabolomics data3 via common use of ChEBI identifiers in existing data
standards (see Fig 2).
Figure 1: Screen capture of ChEBI web interface entry for L-valine.
ChEBI
• Chemical Entities of Biological Interest (ChEBI) is a freely available, manually-
curated dictionary of ‘small’ chemical compounds, either natural or synthetic
products used to intervene in the processes of living organisms.
• ChEBI incorporates an ontological classification, specifying relationships
between molecular entities or classes of entities.
• Data represented includes unique identifiers, names and synonyms, structure,
molecular formula, charge, mass, database links and citations.
• Current release contains 34309 chemical entities.
• ChEBI is available via a web interface (see Fig 1), web services, or download.
libChEBI
Metabolomics
Metabolights
BioPAX
Modelling
Standards and software
Figure 2: Representation of the central position that libChEBI will play in the integration of
metabolomics data into systems biology models.
• libChEBI will interface with existing cheminformatics software, such as the
Chemistry Development Kit (CDK)4 to provide methods for reconciling
metabolites, determining charge states, estimating thermodynamic properties,
facilitating network gap filling, and model checking and validation.
Further enhancements
• Extension of the performance and stability of the ChEBI web service to support
metabolic reconstruction. We will optimise the web service search
implementation and API targeting the types of queries common in the metabolic
reconstruction process.
• Streamline the ChEBI curation process with partial automation of structure-
based classification, text mining for biological roles, and improved workflow.
• Improve the community interface for ChEBI submissions by introducing facilities
for community-based updates.
• Harness the knowledge encoded in metabolic models and in partner databases
such as Metabolights3 to create rich visualisation of chemicals in their biological
context, using "facets" for pathways, reactions and simulations.
• We will undertake to curate all known metabolites from four key organisms into
ChEBI: S. cerevisiae, E. coli, mouse and human.
• Creation of new training materials and delivery of training courses to the
community.
Cheminformatics
This work was funded by the BBSRC under the project “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community”, BB/K019783/1.