Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational
Model
Ana Daglis
Supervised by: Matthew Ludkin
September 4, 2015
Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015 1 / 29

Outline
1 Clustering
2 Model
3 Gibbs Sampling
Methodology
Results
4 Split-Merge Algorithm
Methodology
Results
5 Future Work

Clustering
Clustering
Cluster Analysis: Given an unlabelled data, want algorithms that
automatically group the datapoints into coherent subsets/clusters.
Applications:
recommendation engines (Netflix, iTunes, Quora,...)
image compression
targeted marketing
Google News

Model
Infinite Relational Model
Infinite Relational Model (IRM) is a model, in which each node is
assigned to a cluster. The number of clusters is not known initially and is
learned from the data as part of the statistical inference.
IRM is represented by the following parameters:
zi - cluster, containing node i, for i = 1, ..., n.
φi,j - probability of an edge between i-th and j-th
clusters.

Model
Assumptions
Given the adjacency matrix of the graph, X, as our data, we assume
that Xi,j ∼ Bernoulli(φzi ,zj ).
Since z and φ are not known, hierarchical and beta priors respectively
are imposed:
(
z ∼ CRP(A)
φi,j ∼ Beta(a, b).

Model
Chinese Restaurant Process (CRP(A))
The Chinese restaurant process is a discrete process, whose value at
time n is the partition of 1, 2, ..., n. At time n = 1, have trivial partition
{{1}}. At time n + 1, element n + 1 is either:
1 added to an existing block with probability |b|/(n + A), where |b| is
the size of the block, or
2 creates a completely new block with probability A/(n + A).

Model
1 0 0

Model
1
1+A
A
1+A 0

Model
1
2+A
1
2+A
A
2+A

Model
1
3+A
2
3+A
A
3+A

Gibbs Sampling Methodology
Gibbs Sampling
Want: a sample from a multivariate distribution θ = (θ1, θ2, . . . , θd ).
Algorithm:
1 Initialize with θ = (θ
(0)
1 , θ
(0)
2 , . . . , θ
(0)
d ).
2 For i = 1, 2, . . . , n,
Simulate θ
(i)
1 from the conditional θ1|(θ
(i−1)
2 , . . . , θ
(i−1)
d )
Simulate θ
(i)
2 from the conditional θ2|(θ
(i)
1 , θ
(i−1)
3 , . . . , θ
(i−1)
d )
...
Simulate θ
(i)
d from the conditional θd |(θ
(i)
1 , θ
(i)
2 , . . . , θ
(i)
d−1).
3 Discard the first k iterations and estimate the posterior distribution
using (θ
(k+1)
1 , θ
(k+1)
2 , . . . , θ
(k+1)
d ), . . . , (θ
(n)
1 , θ
(n)
2 , . . . , θ
(n)
d ).

Gibbs Sampling
We use the Gibbs sampling to infer the posterior distribution of z.
The cluster assignments, zi , are iteratively sampled from their
conditional distribution,
P(zi = k|zi
, X) ∝ P(X|z)P(zi = k|zi
),
where zi denotes all cluster assignments except zi .

Simulated Data
We applied the Gibbs sampling algorithm to a simulated network with
the following parameters:
96 nodes split into 6 blocks
φi,i = 0.85, for i = 1, ...n
φi,j = 0.05, for i 6= j
a = b = 1 for uniform prior
A = 1.
(a) Simulated network

Gibbs Sampling Results
Simulated Data
We applied the Gibbs sampling algorithm to a simulated network with
the following parameters:
96 nodes split into 6 blocks
φi,i = 0.85, for i = 1, ...n
φi,j = 0.05, for i 6= j
a = b = 1 for uniform prior
A = 1.
(b) Supplied network

Block structure obtained

Trace-plot of the number of blocks
0 2000 4000 6000 8000 10000
1
2
3
4
5
6
Iteration
Number
of
blocks

Gibbs Sampling Summary
The algorithm fails to split the data into 6 clusters within 10000
iterations, and is stuck in five-cluster configuration for a long time.
The main problem with the Gibbs sampler is that it is slow to
converge, and it often becomes trapped in a local mode (5 blocks
in this case).
A possible improvement is the split-merge algorithm, which updates
simultaneously a group of nodes and avoids these problems.

Split-Merge Algorithm Methodology
Split-Merge Algorithm
Algorithm:
1 Select two distinct nodes, i and j, uniformly at random.
2 If i and j belong to the same cluster, split that cluster into two by
assigning elements to either of the two clusters independently with
equal probability.
3 If i and j belong to different clusters, merge those clusters.
4 Evaluate Metropolis-Hastings acceptance probability. If accepted,
the new cluster assignment becomes the next step of the algorithm.
Otherwise, the initial cluster assignment remains as the next state.
a(z∗
, z) = min[1,
q(z|z∗)P(z∗)L(X|z∗)
q(z∗|z)P(z)L(X|z)
],
where q is proposal probability, P(z) prior, L(X|z) likelihood.

Split-Merge Algorithm Results
Gibbs Sampler + Split-Merge
We applied the Gibbs sampler together with the split-merge algorithm
to the earlier network. For every nine full Gibbs sampling scans, one
split-merge step was used.
The algorithm appropriately splits the data into six clusters, has short
burn-in time and mixes well.

Block structure obtained

Trace-plot of the number of blocks
0 200 400 600 800 1000
1
2
3
4
5
6
7
Iteration
Number
of
blocks

Future Work
Future Work
Assess the performance of the algorithms when the blocks
significantly vary in size.
Evaluate the complexities of the algorithms.
Explore more advanced algorithms (such as the Restricted Gibbs
Sampling Split-Merge).

Future Work
References
Schmidt, M. N. and Mørup, M. (2013). Non-parametric Bayesian modeling of
complex networks.
IEEE Signal Processing Magazine, 30:110-128.
Jain, S. and Neal, R. M. (2004). A split-merge Markov chain Monte Carlo
procedure for the Dirichlet process mixture model.
Journal of Computational and Graphical Statistics, 13:158–182.

Clustering Relational Data using the Infinite Relational Model

Recommended

Recommended

More Related Content

Similar to Clustering Relational Data using the Infinite Relational Model

Similar to Clustering Relational Data using the Infinite Relational Model (20)

Recently uploaded

Recently uploaded (20)

Clustering Relational Data using the Infinite Relational Model