Graph Spectra through Network Complexity Measures: Information Content of Eigenvalues

Graph Spectra through Network
Complexity Measures
Information Content of Eigenvalues
Hector Zenil
(joint work with Narsis Kiani and Jesper Tegn´er)
Unit of Computational Medicine, Karolinska Institutet
@ Department of Mathematics, Stockholm University
Zenil, Kiani, Tegn´er (Karolinska Institutet) Information Content of Eigenvalues May 27, 2015 1 / 42

Outline:
1 Estimating Kolmogorov complexity
2 n-dimensional complexity
3 Graph Algorithmic Probability and Kolmogorov complexity of
networks
4 Applications to complex networks and graph spectra
Material mostly drawn from:
1 joint with Soler et al. Computability (2013). [1]
2 joint with Gauvrit et al. Behavior Research Methods (2013). [3]
3 Zenil et al. Physica A (2014). [4]
4 joint with Soler et al. PLoS ONE (2014). [6]
5 Zenil, Kiani and Tegnér, LNCS 9044, (2015). [2]
6 Zenil and Tegnér, Symmetry (forthcoming).
7 Zenil, Kiani and Tegnér, Seminars in Cell and Developmental
Biology (in revision).

Main goal
Main goal throughout this talk: To study properties of graphs and
networks with measures from information theory and algorithmic
complexity.
Table : Numerical calculations of (mostly) uncomputable functions:
Busy Beaver problem upper semi-computable
Kolmogorov-Chaitin complexity lower semi-computable
Algorithmic Probability (Solomonoﬀ-Levin) upper semi-computable
Bennett’s Logical Depth uncomputable
Lower semi-computable: can be approximated from above.
Upper semi-computable: can be approximated from below.

The basic unit in Theoretical Computer Science
The cell (the smallest unit of life) is to Biology what the Turing machine is
to Theoretical Computer Science.
Finite state diagram
[A.M. Turing (1936)]

One machine for everything
Computation (Turing-)universality
(a) Turing proves that a M with input x can be encoded as an input M(x)
for a machine U such that if M(x) = y then U(M(x)) = y for any Turing
machine M.
You do not need a computer for each diﬀerent task, only one!
There is no distinction between software/hardware or data/program
Together with Church’s thesis:
Church-(Turing)’s thesis
(b) Every eﬀectively computable function is computable by a Turing
machine.
Together the 2 suggest that:
Anything can be programmed/simulated/emulated by a universal
Turing machine.

The undecidability of the Halting Problem
The existence of the universal Turing machine U brings a fundamental
G¨odel-type contradiction about the power of U (any universal machine):
Let’s say we want to know whether a machine M will halt for input x.
Assumption:
We can program U in such a way that if M(x) halts then U(M(x)) = 0
otherwise U(M(x)) = 1. So U is a (halting) decider.
Contradiction:
Let M(x) = U(x), then U(U(x)) = 0, if and only if, U(x) = 1 and
U(U(x)) = 1, if and only if, U(x) = 0.
Therefore the assumption that we can know whether a Turing machine
halts in general is not true.
There is also a non-constructive proof using Cantor’s diagonalisation method.

Computational irreducibility
(1) Most fundamental irreducibility:
If M halts for input x, you have to run either M(x) or U(M(x)) to know
it, but if M does not halt, neither running M(x) or U(M(x)) will tell you
that they do not halt.
Most uncomputability results are of this type, you can know in one
direction but not the other (e.g. when a string is random as we will see).
(2) Secondary irreducibility (corollary):
U(M(x)) can only produce time speedup on M(x) but not computation
speed up (connected to time complexity, P = NP time results) in general,
specially for (1). In other words, O(U(M(x))) ∼ O(M(x)), or
O(U(M(x))) = c × O(M(x)), with c a constant.
(2) is believed to be more pervasive than what (1) implies.

Complexity and information content of strings
Example (3 strings of length 40)
a: 1111111111111111111111111111111111111111
b: 11001010110010010100111000101010100101011
c: 0101010101010101010101010101010101010101
According to Shannon (1948):
(a) has minimum Entropy (only one micro-state).
(b) has maximum Entropy (two micro-states with same frequency
each).
(c) has also maximum Entropy! (two micro-states with same
frequency each).
Shannon Entropy inherits from classical probability
Shannon Entropy suﬀers of similar limitations: strings (b) and (c) have the
same Shannon Entropy (same number of 0s and 1s) but they appear of
very diﬀerent nature to us.

Statistical v algorithmic
Entropy rate can only fix statistical regularities but not correlation
Thue-Morse sequence: 01101001100101101001011001101001
Segment of π in binary: 0010010000111111011010101000100
Definition
Kolmogorov(-Chaitin) complexity (1965,1966):
KU(s) = min{|p|, U(p) = s}
Algorithmic Randomness (also Martin Löf and Schnorr)
A string s is random if K(s) (in bits) ∼ |s|.
Correlation versus causation
Shannon Entropy is to correlation what Kolmogorov is to causation!

Example of an evaluation of K
The string 01010101...01 can be produced by the following program:
Program A:
1: n:= 0
2: Print n
3: n:= n+1 mod 2
4: Goto 2
The length of A (in bits) is an upper bound of K(010101...01) (+ the
halting condition).
Semi-computability of K
Exhibiting a short version of a string is a suﬃcient test for
non-randomness, but the lack of a short description (program) does not
imply a suﬃcient test for randomness.

The founding theorem of K complexity: Invariance to
choice of U
Do we measure K with programming language or universal TM U1 or U2?
|KU1 (s) − KU2 (s)| < cU1,U2
It is not relevant in the limit, the diﬀerence is a constant that vanishes the
longer the strings.
Rate of convergence of K and the behaviour of c with respect to |s|
The Invariance theorem in practice is a negative result
The constant involved can be arbitrarily large, the theorem tells nothing
about the convergence. Any estimating method of K is subject to it.

Compression is Entropy rate not K
Actual implementations of lossless compression have 2 main drawbacks
and pitfalls:
Lossless compression as entropy rate estimators
Actual implementations of lossless compression algorithms (e.g.
Lempev-Ziv, BZip2, PNG), seek for statistical regularities, repetitions in a
sliding ﬁxed-length window of size w, hence entropy rate estimators up to
block (micro-state) length w. Their success is only based on one side of
the non-randomness test, i.e. low entropy = low K.
Compressing short strings
The compressor also adds the decompression instructions to the ﬁle. Any
string shorter than say 100 bits is impossible to further compress or to get
any meaningful ranking from compressing them (100 bps strings in
structural molecular biology is long).

Alternative to lossless compression algorithms
Figure : (originally Emile Borel’s inﬁnite monkey theorem): A monkey on a
computer produces more structure by chance than a monkey on a typewriter.

Algorithmic Probability (semi-measure, Levin’s Universal
Distribution)
Definition
The classical probability of production of a bit string s among all 2n bit
stings of size n (classical monkey theorem):
Pr(s) = 1/2n
(1)
Definition
Let U be a (prefix-free from Kraft’s inequality) universal Turing machine
and p a program that produces s running on U, then
m(s) =
p:U(p)=s
1/2|p|
< 1 (2)

The algorithmic Coding theorem
Connection to K!
The greatest contributor in the def. of m(s) is the shortest program p, i.e.
K(s).
The algorithmic Coding theorem describes the reverse connection between
K(s) and m(s):
Theorem
K(s) = − log2(m(s)) + O(1) (3)
Frequency and complexity are related
If a string s is produced by many programs then there is also a short
program that produces s (Thomas & Cover (1991)).
[Solomonoﬀ (1964); Levin (1974); Chaitin (1976)]

The Coding Theorem Method (CTM) ﬂow chart
Enumerate & run every TM ∈ (n, m) for increasing n and m (Busy Beaver values
to determine halting time, otherwise informed runtime cutoﬀ value (see e.g.
Calude & Stay, Most programs stop quickly or never halt, 2006).
[Soler, Zenil et al, PLoS ONE (2014)]

Changes in computational formalism
[H. Zenil and J-P. Delahaye, On the Algorithmic Nature of the World; 2010]

Elementary Cellular Automata
An elementary cellular automaton (ECA) is deﬁned by a local function
f : {0, 1}3 → {0, 1},
Figure : Space-time evolution of a cellular automaton (ECA rule 30).
f maps the state of a cell and its two immediate neighbours (range = 1)
to a new cell state: ft : r−1, r0, r+1 → r0. Cells are updated synchronously
according to f over all cells in a row.
[Wolfram, (1994)]

Convergence in ECA classiﬁcation (CTM v Compress)
Scatterplot of ECA classiﬁcation: CTM (x-axis) versus Compress (y-axis).
[Soler-Toscano et al., Computability; 2013]

Part II
GRAPH ENTROPY, GRAPH ALGORITHMIC
PROBABILITY AND GRAPH KOLMOGOROV
COMPLEXITY

Graph Entropy definitions are not robust
Several definitions (e.g. from molecular biology) if Graph Entropy have
been proposed, e.g.:
A complete graph has highest entropy H if defined as containing all
possible subgraphs up to the graph size, i.e.
H(G) = −
|G|
i
P(Gi ) log2 P(Gi )
where Gi is a subgraph of increasing size i in G. However,
H(Adj(G)) = −P(Adj(G)) log2 P(Adj(G)) = 0 ! (and also all the adj
matrices of all the subgraphs, so the sum would be 0 too !)
Graph Entropy
Complete and disconnected have then maximal and minimal entropy
respectively. Alternative definitions include, for example, the number of
bifurcations traversing the graph starting from any random node, etc.

Graph Kolmogorov complexity (Physica A)
Unlike Graph Entropy, Graph Kolmogorov complexity is very robust:
complete graph: K ∼ log(|N|) E-R random graph: K ∼ |E|
M. Gell-Mann (Nobel Prize 1969) thought that any reasonable measure of complexity of
graphs should have both completely disconnected and completely connected graphs to
have minimal complexity (The quark and the jaguar, 1994).
Graph Kolmogorov complexity
Complete and disconnected graphs with |N| nodes have low (algorithmic)
information content. In a random graph every edge e ∈ E requires some
information to be described. Both K(G) ∼ K(Adj(G)) !

Numerical estimation of K(G)
An labelled graph is uniquely represented by its adjacency matrix. So the
question is What is the Kolmogorov complexity of an adjacency
matrix?
Figure : Two-dimensional Turing machines, also known as Turmites (Langton, Physica
D, 1986).
We will provide the deﬁnition of Kolmogorov complexity for unlabelled
graphs later.
[Zenil et al. Physica A, 2014]

An Information-theoretic Divide-and-Conquer Algorithm!
The Block Decomposition method uses the Coding Theorem method.
Formally, we will say that an object c has (2D) Kolmogorov complexity:
K2Dd×d
(c) =
(ru,nu)∈cd×d
K2D(ru) + log2(nu) (4)
where cd×d represents the set with elements (ru, nu), obtained from
decomposing the object into (overlapping) blocks of d × d with boundary
conditions. In each (ru, nu) pair, ru is one of such squares and nu its
multiplicity.
[Zenil et al., Two-Dimensional Kolmogorov Complexity and Validation of the
Coding Theorem Method by Compressibility (2012)]

Classiﬁcation of ECA by BDM (= Km) and Compress
Representative ECAs sorted by BDM (top row) and Compress (bottom row).
[H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, Two-Dimensional
Kolmogorov Complexity and Validation of the Coding Theorem Method by
Compressibility (2012)]

Complementary methods for different object lengths
The methods coexist and complement each other for different string
lengths (transitions are also smooth).
method short strings long strings scalability time domain
< 100 bits > 100 bits
Lossless O(n) H
compression ×
Coding
Theorem O(exp) K
method (CTM) × ×
CTM + Block
Decomposition O(n) K → H
method (BDM)
Table : H stands for Shannon Entropy and K for Kolmogorov complexity. BDM
can therefore be taken as an improvement to (Block) Entropy rate for a fixed
block size. For CTM: http://www.complexitycalculator.com

Graph algorithmic probability
Works on directed and undirected graphs.
Torus boundary conditions provide a solution to the boundaries problem.
Overlapping sub matrices avoids the problem of not permutation invariance but
leads to overﬁtting.
The best option is to recursively divide into square matrices for which exact
complexity estimations are known.
[Zenil et al. Physica A (2014)]

K and graph automorphism group (Physica A)
Figure : Left: An adjacency matrix is not a graph invariant yet isomorphic graphs
have similar K. Right: Graphs with large automorphism group size (group
symmetry) have lower K.
This correlation suggests that the complexity of unlabelled graphs is captured by
the complexity of their adjacency matrix (which is a labelled graph object).
Indeed, in Zenil et al. LNCS we show that the complexity of a labelled graph is a
good approximation to its unlabelled graph complexity.

Unlabelled Graph Complexity
The proof sketch of the labelled graph complexity ∼ unlabelled graph
complexity uses the fact that there is an algorithm (e.g. brute force) of
finite (small) size that produces any isomorphic graph from any other.
Yet, one can define Graph unlabelled Kolmogorov complexity as follows:
Definition
Graph Unlabelled Kolmogorov Complexity: Let Adj(G) be the adjacency
matrix of G and Aut(G) its automorphism group, then,
K(G) = min{K(Adj(G))|Adj(G) ∈ A(Aut(G))}
where A(Aut(G)) is the set of adjacency matrices of all G ∈ Aut(G).
(The problem is believed to be in NP but not in NP-complete).
[Zenil, Kiani and Tegnér (forthcoming)]

Graph automorphisms and algorithmic complexity by BDM
Classifying (and clustering) ∼ 250 graphs (no Aut(G) correction) with
diﬀerent topological properties by K (BDM):

Graph definitions
Definition
Dual graph: A dual graph of a plane graph G is a graph that has a vertex
corresponding to each face of G, and an edge joining two neighboring
faces for each edge in G.
Definition
Graph spectra: The set of graph eigenvalues of the adjacency matrix is
called the spectrum of the graph. The Laplacian matrix of a graph is
sometimes also known as the graph’s spectrum.
Definition
Cospectral graphs: Two graphs are called isospectral or cospectral if they
have the same spectra.

Testing compression and BDM on dual graphs

H, compression and BDM on cospectral graphs

Quantifying Loss of Information in Network-based
Dimensionality Reduction Techniques
Figure : Flowchart of Quantifying Loss of Information in Network-based
Dimensionality Reduction Techniques.

Methods of (Algorithmic) Information Theory in network
dimensionality reduction
Figure : Information content of graph spectra and graph motif analysis.
Information content of 16 graphs of diﬀerent types and the information content
of their graph spectra approximated by Bzip2, Compress and BDM.

Figure : Information content progression of sparsiﬁcation. Information loss
after keeping from 20 to 80% of the graph edges (100% corresponds to the
information content of the original graph).

Figure : Plot comparing all methods as applied to 4 artificial networks. The
information content measured as normalized complexity with two different lossless
compression algorithms was used to assess the sparsification, graph spectra and
graph motif methods. The 6 networks from the Mendes DB are of the same size
and each method displays different phenomena.

Eigenvalue information weight effect on graph spectra
In graph spectra either the largest eigenvalue (λ1) is only considered, or all
eigenvalues (λ1...λn) are given the same weight. Yet eigenvalues capture
different properties and are sensitive to graph specifity, e.g. in a complete
graph λ1 provides the graph size.
Figure : Graph spectra can be plotted in an n-dimensional space where n is the
graph node size (and number of Eigenvalues). When a graph G evolves its
spectra changes from Spec1(G) to Spec2(G ) as in the figure, but if not all
eigenvalues are equally important hence the distance d(Spec1(G), Spec2(G )) is
on a manifold and not on Euclidian space.

Eigenvalues in Graph Spectra are not all the same
Nor their magnitude is of any relevance (e.g. taking the largest one only):
Figure : Statistics (ρ) and p-value plots between graph complexity (BDM) and
largest, second largest and smallest Eigenvalues of 204 different graph classes
including 4913 graphs. Clearly the graph class complexity correlates in different
ways to different Eigenvalues.
[Source: Zenil, Kiani and Tegnér LNCS (2015)]

Eigenvalues of evolving networks
Most informative eigenvalues to characterize a family of networks and
individuals in such a family:
Figure : The complexity of graph versus the complexity of the list of eigenvalues
per position (rows) provides information about the amount and kind of
information stored in each eigenvalue, and the maximum entropy of rows also
identiﬁes the eigenvalue that best characterize the changes in the evolving
network that otherwise display very little topological changes.

Entropy and complexity of Eigenvalue families
Let n be the number of datapoints of an evolving graph (or a family of
graphs to study), H the Shannon Entropy, K Kolmogorov complexity and
KS the Kolmogorov-Sinai Entropy (∼ interval Shannon Entropy), then we
are interested in:
H(Spec(Gi
)), K(Spec(Gi
)), KS(Spec(Gi
))
where i ∈ {1, . . . , n} to study the Eigenvalue behavior with respect to
KBDM(Gi ), and
KS(λ1
1, λ2
1, . . . , λn
1)
. . .
KS(λ2
2, λ2
2, . . . , λn
2)
. . .
maximizing the diﬀerences between Gi hence characterizing G in time.

Part 2 Summary
1 We have a sound, robust and native 2-dimensional complexity
measure applicable to graphs and networks.
2 The method is scalable, e.g. in 3 dimensions, I call CTM3D, the “3D
printing complexity measure” because as you can see it only requires
the Turing machine to operate in a 3D grid, effectively the probability
of a random computer program to print a 3D object!
3 The defined graph complexity measure captures algebraic, topological
(and, forthcoming, even physical properties) of graphs and networks.
4 There is a potential for applications in network and synthetic biology.
5 The method may prove to be very effective at giving proper weight to
eigenvalues and even shedding light on their meaning and information
content.

F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Correspondence
and Independence of Numerical Evaluations of Algorithmic Information
Measures, Computability, vol. 2, no. 2, pp. 125–140, 2013.
H. Zenil, N.A. Kiani, J. Tegn´er, Numerical Investigation of Graph Spectra
and Information Interpretability of Eigenvalues, IWBBIO 2015, LNCS 9044,
pp. 395–405. Springer, 2015.
N. Gauvrit, H. Zenil, F. Soler-Toscano and J.-P. Delahaye, Algorithmic
complexity for short binary strings applied to psychology: a primer, Behavior
Research Methods, vol. 46-3, pp 732-744, 2013.
H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Correlation of
Automorphism Group Size and Topological Properties with Program-size
Complexity Evaluations of Graphs and Complex Networks, Physica A:
Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.
J.-P. Delahaye and H. Zenil, Numerical Evaluation of the Complexity of
Short Strings, Applied Mathematics and Computation, 2011.
F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Calculating
Kolmogorov Complexity from the Output Frequency Distributions of Small
Turing Machines, PLoS ONE, 9(5): e96223, 2014.

J.P. Delahaye and H. Zenil, On the Kolmogorov-Chaitin complexity for short
sequences, in Cristian Calude (eds), Complexity and Randomness: From
Leibniz to Chaitin, World Scientific, 2007.
G.J. Chaitin A Theory of Program Size Formally Identical to Information
Theory, J. Assoc. Comput. Mach. 22, 329-340, 1975.
R. Cilibrasi and P. Vitányi, Clustering by compression, IEEE Trans. on
Information Theory, 51(4), 2005.
A.N. Kolmogorov, Three approaches to the quantitative definition of
information Problems of Information and Transmission, 1(1):1–7, 1965.
L. Levin, Laws of information conservation (non-growth) and aspects of the
foundation of probability theory, Problems of Information Transmission,
10(3):206–210, 1974.
R.J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2,
Information and Control, 7:1–22 and 224–254, 1964.
S. Wolfram, A New Kind of Science, Wolfram Media, 2002.

Graph Spectra through Network Complexity Measures: Information Content of Eigenvalues

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Graph Spectra through Network Complexity Measures: Information Content of Eigenvalues

Semelhante a Graph Spectra through Network Complexity Measures: Information Content of Eigenvalues (20)

Mais de Hector Zenil

Mais de Hector Zenil (20)

Último

Último (20)

Graph Spectra through Network Complexity Measures: Information Content of Eigenvalues