Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
SBML (the Systems Biology Markup Language), model databases, and other resources
1. SBML (the Systems Biology Markup Language),
model databases, and other resources
Michael Hucka, Ph.D.
Department of Computing + Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
Email: mhucka@caltech.edu Twitter: @mhucka
CCB 2012, August 2012, Cold Spring Harbor Laboratory, NY, USA
2. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
3. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
5. The many roles of computation in biological research
Instrument/device control, data management, data processing,
database applications, statistical analysis, pattern matching, image
processing, text mining, chemical structure prediction, genomic
sequence analysis, proteomics, other *omics, molecular modeling,
molecular dynamics, kinetic simulation, simulated evolution,
phylogenetics, ... (to name only a subset)!
Focus here: modeling and simulation
6. What are the outcomes of modeling and simulation?
Usually, there are at least two scientific outcomes:
• One or more models (+ associated claims about their behaviors)
• Publication of the results (in some form)
Models come
in many forms
7. Models are results
Models serve as statements of our current understanding of the
phenomena being studied*
• A computational model documents your theory in a concrete form
Model can—
• Reduce ambiguity in communication
• Offer a concrete framework for adding new data and theories
• Support direct evaluation of relationships between theories
Bower & Bolouri, Computational modeling of genetic and biochemical networks, MIT Press, 2001
8. But only if the modeling results are reproducible
9. Is it enough to describe the model & equations in a paper?
Many models have traditionally been published this way
Problems:
• Errors in printing
• Missing information
• Dependencies on
implementation
• Outright errors
• Can be a huge
effort to recreate
10. Is it enough to make your (software X) script available?
It’s vital for good science:
• Someone with access to the same software can try to run it,
understand it, verify the computational results, build on them, etc.
• Opinion: you should always do this in any case
11. Is it enough to make your (software X) code available?
It’s vital for good science—
• Someone with access to the same software can try to run it,
understand it, build on it, etc.
• Opinion: you should always do this in any case
But it’s still not ideal for communication of scientific results:
• What if they don’t have access to that software?
• And anyway, how will people find the model?
• And how will people be able to relate the model to other work?
14. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
16. SBML = Systems Biology Markup Language
Format for representing computational models of biological processes
• Data structures + usage principles + serialization to XML
Neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
Development started in 2000, with first specification distributed in 2001
17. The process is central
• Called a “reaction” in SBML
• Participants are pools of entities (species)
Models can further include:
• Other constants & variables • Unit definitions
• Compartments • Annotations
• Explicit math
• Discontinuous events
Basic SBML concepts are fairly simple
21. Reactions can cross compartment boundaries
c
protein A protein B
n
gene mRNAn mRNAc
22. Reaction/process rates can be (almost) arbitrary formulas
c
protein A f1(x) protein B
n
f5(x) f2(x)
gene f4(x) mRNAn f3(x) mRNAc
23. “Rules”: equations expressing relationships in addition to reaction sys.
g1(x) c
g2(x) protein A f1(x) protein B
.
.
. n
f5(x) f2(x)
gene f4(x) mRNAn f3(x) mRNAc
24. “Events”: discontinuous actions triggered by system conditions
g1(x) c
g2(x) protein A f1(x) protein B
.
.
. n
f5(x) f2(x)
gene f4(x) mRNAn f3(x) mRNAc
Event1: when (...condition...), Event2: when (...condition...), ...
do (...assignments...) do (...assignments...)
25. Annotations: machine-readable semantics and links to other resources
“This is identified “This is an enzymatic
c
g1(x)by GO id # ...” reaction with EC # ...”
g2(x)
. protein A f1(x) protein B
.
“This is a transport
. n
into the nucleus ...” “This compartment
represents the nucleus ...”
f5(x) f2(x)
gene f4(x) mRNAn f3(x) mRNAc
“This event
represents ...”
Event1: when (...condition...), Event2: when (...condition...), ...
do (...assignments...) do (...assignments...)
26. Today: spatially homogeneous models
• Metabolic network models Find
BioM
exam
ples
in
• Signaling pathway models
http:
odels
Data
base
• Conductance-based models //bio
mod
els.ne
t/bio
• Neural models models
• Pharmacokinetic/dynamics models
• Infectious diseases
Coming: SBML Level 3 packages to support other types
• E.g.: Spatially inhomogeneous models, also qualitative/logical
Scope of SBML encompasses many types of models
28. SBML Level 1 SBML Level 2 SBML Level 3
predefined math functions user-defined functions user-defined functions
text-string math notation MathML subset MathML subset
reserved namespaces for no reserved namespaces no reserved namespaces
annotations for annotations for annotations
no controlled annotation RDF-based controlled RDF-based controlled
scheme annotation scheme annotation scheme
no discrete events discrete events discrete events
default values defined default values defined no default values
monolithic monolithic modular
29. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
31. BioModels Database
Stores & serves quantitative models of biological interest
• Free, public resource
• Models must be described in peer-reviewed publication(s)
Hundreds of models are curated by hand
Imports & exports models in several formats
Figure courtesy of Camille Laibe
41. Results of 2011 survey of SBML-compatible software
Question: Which of the following categories best describe your software?
(Check all that apply.)
Simulation software 42
Analysis s/w (in addition, or instead of, simulation) 40
Creation/model development software 31
Visualization/display/formatting software 31
Utility software (e.g., format conversion) 23
Data integration and management software 16
Repository or database 14
Framework or library (for use in developing s/w) 13
S/w for interactive env. (e.g., MATLAB, R, ...) 13
Annotation software 11
0 20 40 60 80
Out of 81 responses
43. libSBML
Reads, writes, validates SBML
Can check & convert units
Written in portable C++
Runs on Linux, Mac, Windows
APIs for C, C++, C#, Java, Octave,
Perl, Python, R, Ruby, MATLAB
Well documented API
Open-source (LGPL)
http://sbml.org/Software/libSBML
44. JSBML
Pure Java implementation
API is compatible with libSBML but
more Java-like
Functionality is subset of libSBML
Open source (LGPL)
http://sbml.org/Software/JSBML
45. How can you stay informed of new developments?
50. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
53. SBML itself provides syntax and only limited semantics
Low info
content
No standard
identifiers
54. SBML itself provides syntax and only limited semantics
Raw models alone are insufficient
Need standard schemes for
Low info machine-readable annotations
content
• Identify entities
• Mathematical semantics
• Links to other data resources
• Authorship & pub. info
No standard
identifiers
55. Element in Entity elsewhere
the model (e.g., in a database)
relationship qualifier
(optional)
Annotations at their simplest
56. Annotations add meaning and connections
Annotations can answer questions:
• “What exactly is the process represented by equation ‘r17’?”
• “What other identities (synonyms) does this entity have?”
• “What role does constant ‘k3’ play in equation ‘r17’?”
• “What organism are we talking about?”
• ... etc. ...
Multiple annotations on same entity are common
57. SBML supports two annotation schemes
SBO (Systems Biology Ontology)
• For mathematical semantics
• One SBML object ← one SBO term
• Short, compact, tightly coupled but limited scope
MIRIAM (Minimum Information Requested In the Annotation of Models)
• For any kind of annotation
• One SBML object ← multiple MIRIAM annotations
• Larger, more free-form, wider scope
Both are externalized and independent of SBML
62. Software can use SBO terms to help you work with models
semanticSBML
SBMLsqueezer
63. MIRIAM (Minimum Information Requested In the Annotation of Models)
Addresses 2 general areas of annotation needs:
Requirements for Scheme for encoding
reference correspondence annotations
Annotations for Annotations for
attributing model referring to external
creators & sources data resources
MIRIAM is not specific to SBML
64. MIRIAM (Minimum Information Requested In the Annotation of Models)
Addresses 2 general areas of annotation needs:
Requirements for Scheme for encoding
reference correspondence annotations
Annotations for Annotations for
attributing model referring to external
creators & sources data resources
MIRIAM is not specific to SBML
65. Goal: permit tracing model’s origins & people involved in its creation
Minimal info required:
• Name for the model
• Citation for a description of what is being modeled & its author
• Contact info for the model creator(s)
• Creation date & time
• Last modification date & time
• Statement of the model’s terms of distribution
- Specific terms not mandated, just a statement of the terms
Annotations for attributing model creators and sources
66. MIRIAM (Minimum Information Requested In the Annotation of Models)
Addresses 2 general areas of annotation needs:
Requirements for Scheme for encoding
reference correspondence annotations
Annotations for Annotations for
attributing model referring to external
creators & sources data resources
MIRIAM is not specific to SBML
67. MIRIAM (Minimum Information Requested In the Annotation of Models)
Addresses 2 general areas of annotation needs:
Requirements for Scheme for encoding
reference correspondence annotations
Annotations for Annotations for
attributing model referring to external
creators & sources data resources
MIRIAM is not specific to SBML
68. Annotations for external references
Goal: link model constituents to corresponding entities in
bioinformatics resources (e.g., databases, controlled vocabularies)
• Supports:
- Precise identification of model constituents
- Discovery of models that concern the same thing
- Comparison of model constituents between different models
MIRIAM approach avoids putting data content directly in the model;
instead, it points at external resources that contain the knowledge.
70. http://www.ebi.ac.uk/chebi
salicylic acid
Known by different names –
Low info you want to write all of
do
content them into your model?
Why might you care?
71. Identifying resources has its own challenges
For linking to data, need:
• Globally unique, unambiguous identifiers
• ... that are persistent despite resource changes (e.g., changed URLs)
• ... that are maintained by the community
Problem: different resources have different identification schemes
• E.g.: entity “16480”
- In ChEBI: entry 16480 is nitrous oxide
- In PubMed: entry 16480 is the 1977 paper “Effect of gallstone-
dissolution therapy on human liver structure”
- In PubChem: entry 16480 is 1-chloro-4-isothiocyanatobenzene
72. How do we create globally unique identifiers consistently?
Long story short:
• Create unique resource identifiers (URIs) by combining 2 parts:
namespace entity identifier
{
{
Identifies a dataset Identifies a datum
within the dataset
• Create registry for namespaces
- Allows people & software to use same namespace identifiers
• Create service for URI resolution
- Allows people & software to take a given resource identifier and
figure out what it points to
73. Resolving resource identifiers
MIRIAM Registry supports the creation of globally unique identifiers
• Example MIRIAM identifier:
urn:miriam:ec-code:1.1.1.1
• Provides various data about the
resource, including alternate servers
• Provides web services
identifiers.org is layered on top of that and provides resolvable URIs
• Can type it in a web browser!
• Example identifiers.org URI:
http://identifiers.org/ec-code/1.1.1.1
75. Annotations enable many interesting possibilities
Annotations interesting possibilities
semanticSBML
Figure courtesy of Wolfram Leibermeister
76. Summary: why care about standard ways of writing annotations?
Structured, machine-readable annotations increase your model’s utility
• Allow more precise identification of model components
- Understand model structure
- Search/discover models
- Compare models
• Adds a semantic layer—integrates knowledge into the model
- Helps recipients understand the underlying biology
- Allows for better reuse of models
- Supports conversion of models from one form to another
77. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
78. Model representation level
Concept due to Nicolas Le Novère
Visual interpretation
Biological semantics
Dis
Co cre
nti te Mathematical semantics
nuo s toc
us ha
lum sti
pe ce
Me dp nti
tie tion
an Sta ara s lc rea ion
fie te me ode tat
ld
ap tra ter M la nno
pro ns
itio ode al ysis
xim n M
de l an ults
Mo res
ati
on erical
Num
Model type Model life-cycle
Major dimensions of a computational model
80. SBML Level 3: Supporting more categories of models
Package W
Package X Package Y Package Z
SBML Level 3 Core
(dependencies)
An SBML Level 3 package adds constructs & capabilities
Models declare which packages they use
• Applications tell users which packages they support
Package development can be decoupled
81. Level 3 package What it enables
Hierarchical composition Models containing submodels
Flux balance constraints Flux balance analysis models
Qualitative models Petri net models, Boolean models
Spatial Nonhomogeneous spatial models
Multicomponent species Entities with structure & state; rule-based models
Graph layout Diagrams of models
Graph rendering Diagrams of models
Distribution & ranges Nonscalar values
Annotations Richer annotation syntax
Groups Arbitrary grouping of model components
Dynamic structures Creation & destruction of model components
Arrays & sets Arrays or sets of entities
82. How can we capture the simulation/analysis procedures?
84. SED-ML = Simulation Experiment Description ML
Application-independent format to capture procedures, algorithms,
parameter values
• Neutral format for encoding the steps to go from model to output
Can be used for
• Simulation experiments encoding parametrizations & perturbations
• Simulations using more than one model
• Simulations using more than one method
• Data manipulations to produce plot(s)
libSedML project developing API library
http://www.biomodels.net/sedml
86. Graphical representation of models
Today: broad variation in graphical notation used in biological diagrams
• Between authors, between journals, even people in same group
However, standard notations would offer benefits:
• Consistency = easier to read diagrams with less ambiguity
• Software support: verification of correctness, translation to math
87. SBGN = Systems Biology Graphical Notation
Goal: standardize the graphical notation in diagrams of biological processes
• Community-based development, à la SBML
Many groups participating
3 sublanguages to describe different facets of a model
http://sbgn.org
88. General background and motivations
Brief summary of SBML features
Outline
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
89. Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010
Such standards are the work of a great community
90. Get involved and make things better!
COMBINE (Computational Modeling in Biology Network)
• SBML, SBGN, BioPAX, SED-ML, CellML, NeuroML
http://co.mbine.org
Upcoming meeting: August 15–19 in Toronto, Canada
• Right before ICSB (International Conference on Systems Biology)
92. I’d like your feedback!
You can use this anonymous form:
http://tinyurl.com/mhuckafeedback
93. SBML was made possible thanks to funding from:
National Institute of General Medical Sciences (USA)
European Molecular Biology Laboratory (EMBL)
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
ELIXIR (UK)
Beckman Institute, Caltech (USA)
Keio University (Japan)
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Educ., Culture, Sports, Science and Tech.
BBSRC (UK)
National Science Foundation (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Air Force Office of Scientific Research (USA)
STRI, University of Hertfordshire (UK)
Molecular Sciences Institute (USA)