This document discusses using biological networks to analyze and interpret biological knowledge. It begins with an overview of networks as tools to reduce complexity and integrate data. Key properties of networks are described, including nodes, edges, degree distribution, clustering coefficient, and centrality measures. Methods for analyzing networks like community detection and network motifs are also covered. The document emphasizes that biological networks must be analyzed and interpreted based on their properties and by mapping relevant biological data to provide meaningful insights.
Interpretation of the biological knowledge using networks approach
1. Interpretation of the biological
knowledge using networks approach
Elena Sügis
elena.sugis@.ut.ee
Bioinformatics for bioengineers LTTI.00.016, Spring 2018
3. Image 2 is adapted from http://www.jillkgregory.com/new-gallery-17/
lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
Networks-the language of complex systems
Image 1 is adapted from https://en.wikipedia.org/wiki/Complex_network
4. Networks are powerful tools
Analysis
• Topological properties
• Hubs and subnetworks
• Classify, cluster and diffuse
• Data integration
Visualization
• Data overlays
• Layouts and animation
• Exploratory analysis
• Context and interpretation
Image is adapted from Cassar, EMBO Reports 2015, Fig.8
5. • Reduce complexity
• More efficient than tables
• Great for data integration
• Intuitive visualization
Benefits of using networks
6. 6
3
4
5
2
1
• NODES
• EDGES
Graphs are mathematical structure composed of set of objects
where pairs of the objects are connected by links
Networks can be built for any functional system
Networks - are graphs
7. • Genes
• Proteins
• Metabolites
• Enzymes
• Organisms
6
3
4
5
2
1
Nodes
The nodes in the networks represent related objects
8. Biological relationships:
• Interactions
• Regulations
• Reactions
• Transformations
• Activations
• Inhibitions
etc.
Edges
The edges in the network represent the type of relationship
between two entities
A B
A B
A B
A B
activates
binds to
has similar
sequence
co-cited
9. Edges
A B
A B
A B
directed
undirected
weighted
0,8
The architecture (or topology) of a network can be represented as
graph with links between the parts.
10. Image is adapted from https://www.systemsbiology.org/about/what-is-systems-biology/
Interactome
With networks, we can organize and integrate information at different levels
12. Pathways
NETWORKS PATHWAYS
Collection of binary interactions Human-curated, detailed
Large scale Small scale
Generated from omics data
Constructed from literature/domain
expert knowledge
A pathway is a series of actions among molecules in a cell that leads to a
certain product or a change in a cell.
13. You want to know:
- Type of relationships between genes
- Strength of relationship
- Functions of the related genes
- Pathways
- etc.
Gene list from
experiment
APP
PSEN1
FYN
MAPT
BIN1
EPHA1
EPHA2
PSEN
What network can tell you
14. What network can tell you
You can:
• Visually identify relationships among the group of
biological entities
• Find drag targets
• Identify overrepresented gene/protein functions
• Discover biological pathways
Alzheimer’s disease
15. • Series of molecular cancer
profiles
• Clinical, genomic, methylation,
RNA and proteomic signatures.
• Multiple data types integrated
into signalling network
• Includes patient sample-level
data
Image is adapted from TCGA (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature, 499, Fig. 4
Networks application in research
17. Data comes in different forms
Computational data -
results of the analysis
Raw data -
results of the experiments
Sequencing technologies
Mass spectrometry
healthy cell cancer cell
DNA
RNA
Protein
co-expression
differential
expression
22. Biological networks rarely tell us anything by themselves
Analysis involves:
• Understanding the characteristics of the network
• Modularity
• Comparison with other networks (i.e., random networks)
Visualization involves:
• Placing nodes in a meaningful way (layouts)
• Mapping biologically relevant data to the network
• Change node size, colour, edge weights, etc.
which allows better biological interpretation.
Making sense of the biological networks
32. Degree distribution
Degree of a node is the number of edges incident to the node.
Degree distribution:
• Let P(k) be the percentage of nodes of degree k in the network.
The degree distribution is the distribution of P(k) over all k.
• P(k) can be understood as the probability that a node has degree k.
P(k) ~
e−λ
λk
k!
Image is adapted from E. Ravasz et al., Science, 2002
33. Degree distribution in scale-free networks
• Networks with power-law degree distributions are called scale-free
networks
• Most nodes are of low degree, but there is a small number of
highly-linked nodes (nodes of high degree) called “hubs.”
P(k) ~ k−γ
Image is adapted from E. Ravasz et al., Science, 2002
34. Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)
35. Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)
36. Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)
37. Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)
38. Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)
39. Hierarchical modularity
Many highly connected small clusters
combine into
few larger but less connected clusters
combine into
even larger and even less connected clusters
Clustering coefficient follows power-law distributionC(k) ~ k−β
40. Comparison of the network properties
Image is adapted from E. Ravasz et al., Science, 2002
C(k) ~ k−β
P(k) ~ k−γ
P(k) ~
e−λ
λk
k!
41. Shortest path
• Distance between two nodes is the smallest number of links that
have to be traversed to get from one node to the other.
Shortest path is the path that achieves that distance.
• Small world network is characterised by small average path length
l =
2
N(N −1)
lij
i<j
∑
lij is the shortest path length between node i and j
43. Defining important nodes in biological
networks
the most connected?
connects other nodes in the network?
the closest to other nodes?
44. Centrality
Centrality quantifies the topological importance of a node (edge) in a network.
• Degree centrality defined number of
edges incident upon a node (find hubs).
C D (node) = Degree of this node
• Betweenness centrality indicates how
much load is on a node (bottleneck).
C B (node) = The average number of
shortest paths that go through this node
• Closeness centrality defines how close a
node is to all other nodes in the network.
C C (node) = Inverse of the average of the
shortest paths to all other nodes.
https://cytoscape.github.io/cytoscape-tutorials/presentations/modules/network-analysis/index.html#/0/6
45. Figure is partially adapted with modifications from original https://cytoscape.github.io/cytoscape-tutorials/presentations/modules/network-analysis/index.html#/0/6
How different centralities look
HUB
node that connect two sub-networks
closest node to all other nodes
46. Biological meaning
Degree centrality Closeness centralityBetweenness centrality
• Amount of control that
this node has over the
interactions of other
nodes in the network
• How much information
load is on the node
• Describes connectivity of
the network
• Nodes that connect two
sub-networks
• Can be calculated for
edges as well
• Nodes with a high
degree are also called
hub nodes
• Real networks have many
nodes with low degree
and few nodes with high
degree
• Nodes with a high
degree tend to be
essential nodes
• Regulatory elements like
transcription factors often
have a high out-degree
• Indication for how fast
information spreads from
a given node to other
reachable nodes in the
network
• The more central a node
is, the smaller is the
distance to all other
nodes, the higher is the
closeness
Material is adapted from BioSB 2015 Network Analysis Course
47. Brain connectivity
• A few regions that link the left and the right half of our brain
• They therefore have a high betweenness
AS. Panditet al, Cerebral Cortex (2014) Whole-brain mapping of structural connectivity in infants reveals altered connection strength associated with growth and preterm birth
48. Biological networks
• Free-scale networks (tend to have power-law degree
distribution)
• “Small world” networks (small average path length)
• Have hierarchical modularity property (have a high
clustering coefficient independent of network size)
• Robustness (have strong resistance to failure on random
attacks and vulnerable to targeted attacks)
50. Pattern (sub-networks) that occurs more often than in randomised networks
Network motifs
Different types of network show different motifs. Gene regulatory
networks with transcription factors have typical regulation motifs.
51. Motifs in yeast regulatory network
Image is adapted from Lee et al. Transcriptional Regulatory Networks in Saccharomyces cerevisiae, Science 2002
52. Motifs in yeast regulatory network
• consists of a regulator
that binds to the
promoter region of its
own gene
• reduced response
time to environmental
stimuli
• decreased cost of
regulation
• increased stability of
gene expression
53. Motifs in yeast regulatory network
• consists of a
regulatory circuit
whose closure
involves two or more
factors
• provides the capacity
for feedback control
• offers the potential to
produce bistable
systems that can
switch between two
alternative states
54. Motifs in yeast regulatory network
• contains a regulator that
controls a second
regulator and both
regulators bind a common
target gene
• acts as a switch that is
designed to be sensitive
to sustained inputs
• provides control of
expression of target gene
depending on the
accumulation of adequate
levels of the master and
secondary regulators
55. Motifs in yeast regulatory network
v
• contains a single regulator
that binds a set of genes
under a specific condition
• is responsible for some
particular biological
function
v
56. Motifs in yeast regulatory network
v
v
• set of regulators that bind
together to a set of genes
• coordinates gene
expression across a wide
variety of biological
conditions
• two different regulators
responding to two different
inputs allow coordinate
expression of the set of
genes under two different
conditions
57. Motifs in yeast regulatory network
v
• consists of chains of three
or more regulators in
which one regulator binds
the promoter for a second
regulator and so on
• simplest ordering of
transcriptional events
• regulators functioning at
one stage of the cell cycle
regulate the expression of
factors required for entry
into the next stage of the
cell cycle
59. Community detection
Figure is adapted from original https://cytoscape.github.io/cytoscape-tutorials/presentations/advanced-automation-2017-mpi.html#/11
Identifying closely-related groups of nodes (modules/clusters)
• Based on topology
• Based on a shared function(s)
62. MCL-based modules
• Flow simulation based method
• Consider a graph with many links within a cluster, and fewer links
between clusters.
• This means if you were to start at a node, and then randomly travel
to a connected node, you’re more likely to stay within a cluster than
travel between.
• By doing random walks in the graph, it may be possible to discover.
where the flow tends to gather, and therefore, where clusters are
• Random Walks on a graph are calculated using “Markov Chains”.
Image is adapted from https://micans.org/mcl/
69. Functional characterisation
Identify biological function of the module
Cellular component
Molecular function
Biological process
Gene Ontology
KEGG
Reactome
Pathways
Regulation
miRBase miRNAs
TRANSFAC TF targets
Biogrid PPIs
CORUM protein complexes
Human Phenotype Ontology
Extra
71. Functional enrichment
Does your gene list includes more
genes with function x than expected by
random chance?
Genes with
known
function x
?
Your gene
list
72. Tool for functional enrichment
http://biit.cs.ut.ee/gprofiler
J. Reimand, M. Kull, H. Peterson, J. Hansen, J. Vilo: g:Profiler - a web-based toolset for
functional profiling of gene lists from large-scale experiments (2007) NAR 35 W193-W200
Jüri Reimand, Tambet Arak, Priit Adler, Liis Kolberg, Sulev Reisberg, Hedi Peterson, Jaak
Vilo: g:Profiler -- a web server for functional interpretation of gene lists (2016 update)
Nucleic Acids Research 2016; doi: 10.1093/nar/gkw199
73. 2175 modules found
Enrichment results for example module
https://biit.cs.ut.ee/graphweb/
Example of module functional
characterisation