3. Overview
• Network structure is ubiquitous (everywhere)
• Relationships between components in the network are inherently multifaceted
Social Network Transportation Network Brain Network
4. Overview
• Application
• Link Prediction (Recommendation), Cascade Predicting, Community
Detection, Ranking, Classification, etc.
• Challenge
• independent layer topology: node’s centrality, clustering, etc.
• coupling dynamics of each layer: diffusion, synchronization, etc.
• Traditional Methods
• topology: modularity, spectral, etc.
• dynamics: DeepWalk, Hawkes, etc.
5. Overview
• coarse grained comparison of the two papers:
Paper Id Application Subject Network Properties
Community Detection Complex Network Dynamics
Ranking Data Mining Topology
6. PHYSICAL REVIEW X 5, 011027 (2015)
Identifying Modular Flows on Multilayer Networks Reveals Highly
Overlapping Organization in Interconnected Systems
7. Author
• Manlio De Domenico
• https://comunelab.fbk.eu/manlio/index.php
• the Head of the Complex Multilayer Networks (CoMuNe)
Research Unit at the Center for Information Technology of
Fondazione Bruno Kessler.
• Representative work:
• Mathematical Formulation of Multi-Layer Networks (PRX2013)
• Random Walks on Multiplex Networks (PNAS 2014)
• Diffusion Geometry Unravels the Emergence of Functional
Clusters in Collective Phenomena (PRL 2017)
8. Author
• Martin Rosvall
• http://www.tp.umu.se/~rosvall/index.html
• Associate professor at Umeå University(于默奥大学)
• Representative work:
• Maps of random walks on complex networks reveal community
structure (PNAS 2008, citations 2148)
9. Problem
• conventional method assumes the single type of static link, weighted
and directed at best
• aggregating multiple types of relationships into a single network will
distort both the topology and dynamics on the network
• How to define community in multilayer network ?
Solution: extend Map Equation to multilayer network
10. An Illustrative Example
A multilayer network with
4 nodes and 3 layers
A random walk moving
between nodes across layers
11. An Illustrative Example
4 nodes represented in
one layer
trajectory of random walk reveals
two overlapping commuties
Intuitive Definition: module that random walk stays for relatively long time
12. Map Equation
• describe the random walk path on the network efficiently
•
[Maps of random walks on complex networks reveal community structure PNAS2008]
efficiently means shortest description code
assign frequently visited node shorter code
Huffman Coding
13. Map Equation
• describe the random walk path on the network efficiently
•
[Maps of random walks on complex networks reveal community structure PNAS2008]
14. Map Equation
• describe the random walk path by two levels of description:
community -> ‘province’
node -> ‘city’
15. Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step average description
length of movements
between modules
per step average description
length of movements within
modules
average number of
bits per step to
describe an
random walk
16. Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step prob. that
random walk switches
modules
lower limit of the average length of
a code to describe a module
17. Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
teleportation prob. prob. that random
walk jumps to outside
of the module
18. Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step prob. that
random walk stays
within module i
lower limit of the average length of
a code to name a node in module i
19. Flow Dynamics on Multilayer Networks
Random walker switches from node i in layer α to j node β in layer by the following
transition probabilities:
relax the Random Walker to jump to any node in any layer
20. Multiplex Map Equation
per step average description length of the trajectory of an ergodic random walker
on a multiplex network
• state nodes of a physical node can be assigned to multiplex modules
• if they are assigned to the same module, they are assigned a common code
21. Synthetic Experiment (1/3)
• synthetic data (LFR benchmark network)
• generate T independent LFR benchmark networks for different modes
• sample L network layers from each of the mode networks
• then the ground truth includes T communities
• Metric (Normalized Mutual Information)
• measure similarity between the result partition and the ground truth
22. Synthetic Result (2/3)
infomap applied on each
layer separately
infomap on the supra-
adjacency of the multiplex
network
23. Synthetic Result (3/3)
use r=0.15 thought the analysis Modularity performance as the function of DX
Multiplex: measured as the NMI of state nodes
Average: averaged across network layers
24. Collaboration Networks
Pierre Auger Collaboration of physicists arXiv of research working on networks
512 node with 12,964 edges 14,488 node with 70,350 edges
25. Layer Overlapping
heat measures as the fraction of state nodes in different network layers
that are assigned to the same communities
26. Relax rate effect versus Aggregation
increasing relax rate
random walker jumps more freely
results close to aggregation
overlap decreased
29. Author
• Hanghang Tong
• http://tonghanghang.org/
• Assistant Professor at School of Computing,
Informatics and Decision Systems Engineering, Arizona
State University
• Representative work:
Fast random walk with restart and its applications
(ICDM 2006, citation 672)
30. Background: Ranking
• Ranking without query
• Rank all nodes based on certain measure,
e.g., PageRank, HITS
• Who are most popular users?
• Ranking with query
• Find top-k most “similar” nodes for a query node based
on certain measure, e.g., Personalized PageRank, SimRank
• Who are potential friends of Jon?
31. Background: NoN(Network of Networks)
Co-author Networks
Disease Network of Protein
Interaction Networks
32. Problem: Ranking on NoN
• How to identify the importance of Jon
in Data Mining by considering his
overall contributions in related areas ?
33. Problem: Query on NoN
• Which Bioinformatics researcher are
most likely to collaborate with Data
Mining research Jon?
37. CrossRank
Problem 1: Cross Rank
Given: (1) an NoN, and (2) the query
vectors ei (i = 1,…,g);
g: # of main nodes in the NoN
ei : an ni×1 nonnegative vector
ni : # of nodes in domain specific network
Find: ranking vectors ri for the nodes in
the domain specific networks Ai
38. Regularized Optimization Problem
• is the ranking vector of the domain specific network
• is the degree of main node i in the main network
• is the symmetric normalized adjacency matrix
• is the set of common nodes between and
39. Regularized Optimization Problem
(1) Within-network Smoothness
for adjacency two node x and y, min the
(2) Query Preference
(3) Cross-network Consistency
min the difference between and
41. CrossQuery
Problem 2: Cross Query
Given: (1) an NoN, and (2) a query from a
source domain-specific network As ,(3) a
target domain-specific network Ad , and (4)
an integer k;
Find: the top-k most relevant nodes from
the target domain-specific network Ad
w.r.t the query node
42. CrossQuery
• CrossQuery-Basic
CrossQuery is a special case of CorssRank by
(1) requiring the query node begin from the source domain network
(2) restricting the query results within a target domain network
(3) searching for a set of k most relevant nodes from target domain network
43. CrossQuery-Fast
• Idea: given source and target domain network of main nodes s and d
respectively, prune less relevant main nodes. Then apply CrossQuery-
Basic on the pruned NoN
ranking score of node v w.r.t a query node q can be recast to:
transition prob. of a
random walk path
prune low prob. path
45. CrossQuery Effectiveness
Experiment Setting
• for a given query author predict future
collaborators for this author in a relevant are
• partition the DBLP dataset into two parts
T1 from 2001 to 2005, T2 from 2006 to 2010
• interested in the DB researchers who have no
DM publications during T1 but collaborate with
DM researchers during T2
47. Protein Interaction NoN
similarity between 5080 diseases is
retrieved from the OMIM database
include 9998 proteins for 60
human tissues using gene
expression
54. Author
• Yuxiao Dong
• Research Interests:
• social networks, data mining, and computational social science, with an
emphasis on applying computational models to addressing problems in large-
scale networked systems, such as Microsoft Academic Graph(MAG), online
social media, and mobile communication.
57. Word Embedding in NLP
• Input: a text corpus
• Output: d-dim vector for each word W
{ }D W
, ,
W d
X R d W
wX
geographically close words – a word and its context words – in a sentence or document
exhibit interrelations in human natural language
58. Network Embedding
• Input: a network
• Output: , d-dim vector for each node V.
( , )G V E
,
V d
X R d V
vX
59. Heterogeneous Network Embedding: Problem
• Input: a heterogeneous information network
• Output: , d-dim vector for each node V.
( , , )G V E T
,
V d
X R d V
vX
60. Heterogeneous Network Embedding: Challenges
• How do we effectively preserve the concept of “node-context” among
multiple types of nodes, e.g., authors, papers, & venues in academic
heterogeneous networks?
• Can we directly apply homogeneous network embedding architectures
(skip-gram) to heterogeneous networks?
• It is also difficult for conventional meta-path based methods to model
similarities between nodes without connected meta-paths
for example, one publishes 10 papers all in NIPS and the other has 10 publications all
in ICML; their “APCPA”-based PathSim similarity would be zero
63. metapath2vec: Meta-Path-Based Random Walks
• Goal: to generate paths that are able to capture
both the semantic and structural correlations
between different types of nodes, facilitating the
transformation of heterogeneous network
structures into skip-gram
64. metapath2vec: Meta-Path-Based Random Walks
• Given a meta-path scheme
• The transition probability at step i is defined as
• Recursive guidance for random walkers, i.e.,
denote the type of
neighborhood of node
1( )i
t tN v 1tV
65. metapath2vec: Meta-Path-Based Random Walks
• Given a meta-path scheme (Example)
OAPVPAO
• In a traditional random walk procedure, in the toy example, the
next step of a walker on node a4 transitioned from node CMU
can be all types of nodes surrounding it—a2, a3, a5, p2, p3, and
CMU.
• Under the meta-path scheme ‘OAPVPAO’, for example, the
walker is biased towards paper nodes (P) given its previous step
on an organization node CMU (O), following the semantics of
this meta-path.
79. Author
• Yang Yang http://yangy.org/
• Assistant Professor, Zhejiang University
“I am an assistant professor at College of Computer Science and Technology,
Zhejiang University. My research focuses on mining deep knowledge from large-
scale social and information networks. I obtained my Ph.D. degree from
Tsinghua University in 2016, advised by Jie Tang and Juanzi Li. During my Ph.D.
career, I have been visiting Cornell University (working with John Hopcroft) in
2012, and University of Leuven (working with Marie-Francine Moens) in 2013. I
also fortunately have Yizhou Sun from UCLA as my external advisor.”
80. Introduction
• recent network representation learning methods mainly focus on
preserving microscopic structure of network
• scale-free property, one of the most fundamental macroscopic
properties of networks, is largely ignored
• the majority of vertexes connected to a high-degree vertex is of low
degree, and not likely connected to each other
• incorporating scale-free property in network embedding can reflect
and preserve the sparsity of real-world network
• intuitively, the problem is caused by the insufficient volume of latent
space to place node to preserve the scale-free property
82. Introduction
• overestimate the high degrees will
• distort the most fundamental properties of real world network
• reconstruct a denser network than its true topology
• influence various network mining tasks, such as vertex classification and link
prediction
83. Preliminaries
• Network embedding
• Given an undirected graph , the problem of graph embedding
aims to represent each vertex into a low-dimensional space , i.e.,
learning a function , where is the embedding matrix ,
and network structures can be preserved in .
• Network reconstruction
• reconstruct the network edges based on distances between vertexes in latent
space . The probability of an edge between and defined as
• in practice, a threshold is chosen and an edge will be created if
• this method is denoted as
( , )G V E
iv V k
R
: n k
f V U
U
U
k n
k
R iv jv
84. Reconstructing Scale-free Networks
• Intuition
• denote all points fall in the closed ball of radius centered at by
• when the center point is a high-degree vertex, there will be many points in
• intuitively, it will be more likely that the distances between these points are less
than
85. Reconstructing Scale-free Networks
• Theorem 1 (Sphere Packing)
• Remark
Theorem 1 converts the problem of reconstructing a scale-free network to the
Sphere Packing Problem, which seeks to find the optimal packing of spheres in
high dimensional spaces
86. Reconstructing Scale-free Networks
• Theorem 2 (Upper and lower bounds for packing density)
• Packing Density. The packing density is the fraction of the space filled by the
spheres making up the packing. Denote the optimal sphere packing density in
as
• The best bound [Cohn and Zhao 2014]
87. Reconstructing Scale-free Networks
• Theorem 3
• Remark
theorem 3 is the most important contribution of this paper. It gives a
theoretical lower bound of the dimension of latent space to sufficiently embed
the network to preserve the scale-free property
For instance, when , we can get that , which is
enough to keep scale-free holds for most real-world networks.
88. Proposed approach
• General idea ( degree penalty)
• while preserving first- and second-order proximity, the proximity between
vertexes that have high degrees shall be punished.
• for example, the celebrity may not be familiar or similar to her followers.
• Two followers of the same celebrity may not know each other at all and can
be totally dissimilar
• A mediocre user is more likely to known and to be similar to her followers.
89. ModelⅠ:DP-Spectral
• DP-Spectral (Degree Penalty based Spectral Embedding)
• define a matrix to indicate the common neighbors of any two vertexes
• define another matrix to incorporate the first-order proximity
• further extend to consider degree penalty
proportional to proximity between nodes and
inversely proportional to vertex degrees
91. ModelⅡ:DP-Walker
• Define the probability of the random walk jumping from to
• node j will have a greater chance to be sampled when it has more
common neighbors with node i and has a lower degree
• feed the obtained random walk generated paths into the skip-gram
to learn effective vertex representations for G
iv jv
92. Experiments
Datasets
• Synthetic: generate a synthetic dataset by the Preferential Attachment model
• Facebook: a subnet of Facebook
• Twitter: a subnet of Twitter
• Coauthor: scientific collaborations between authors. An undirected edge exists
between two authors if they have coauthored at least on paper
• Citation: an academic network within which edges indicate citations
• Mobile: a mobile network provided by PPDai(拍拍贷) within which edges
indicate one of the users has called the other
94. Experiments
Tasks
• Network Reconstruction
• evaluate the performance of different algorithms by the correlation
coefficients between the reconstructed degrees and the degrees in the given
network.
• Link Prediction
• feed the learned vector to a linear regression classifier and determine
whether there exists an edge between two nodes
• Vertex Classification
• given a vertex i, define its feature vector as learned representation, and train
a linear regression classifier to determine its label
96. Link Prediction
• set as optimized value in
previous result
• in most cases, DP-Spectral
obtains the best performance
97. Vertex Classification
• DP-Spectral achieves the best result for 5 out of 7 labels
• Its stability of the performance can also be observed from the table
98. Conclusion
• analyze the difficulty and feasibility of reconstructing a scale-free
network based on learned vertex representations in the Euclidean
space, by converting the problem to the Sphere Packing problem.
• propose the degree penalty principle and two implementations to
preserve the scale-free property of networks and improve the
effectiveness of vertex representations.
• validate our proposed principle by conducting extensive experiments
and find that our approach achieves a significant improvement on six
datasets and three tasks compared to several state-of-the-art
baselines.
100. Author
• Xiao Wang(王啸) https://sites.google.com/site/wangxiaotjucs/
• Currently a postdoc in the Department of Computer Science and
Technology at Tsinghua University working with Professor Shiqiang
Yang.
• current research interests include data mining, machine learning, and
analysis of complex networks. In particular, I am interested in the
network embedding and how to effectively detect and analyze the
communities of complex networks.
101. Author
• Peng Cui(崔鹏) http://media.cs.tsinghua.edu.cn/~multimedia/cuipeng/
• Associate Professor (Tenured), Lab of Media and Network, Department of
Computer Science and Technology, Tsinghua University
• research interests include social network analysis and social multimedia
computing. In social network analysis, I focus more on computational
modeling of complex user behaviors, including individual user behavior
prediction, social group analysis and information propagation modeling.
In social multimedia computing, I am keen to promote the convergence of
user behavioral modeling and multimedia content analysis, with the
ultimate goal to bridge the gap between multimedia data and user needs.
102. Introduction
• Recent network embedding methods focus on the microscopic
structure of network, i.e., the pairwise similarity between nodes.
• Some researchers further extend to capture high-order proximity.
• the community structure, one important mesoscopic description of
network structure, is largely ignored.
• It is well recognized that clustering is one of the most prominent
features of networks
103. Introduction
• The representations of nodes within a community should be more
similar than those belonging to different communities even if they
have weak relationship.
• Pairwise similarities will also be strengthened by the community
structure constraint which can be used to solve data sparsity issue.
104. M-NMF Model
General Idea
• Modularized Nonnegative-Matrix-Factorization Model
• preserves both the microscopic structure (first and second-order
proximities) and mesoscopic community structure
• these two terms are connected by exploiting the consensus
relationship between the representations of nodes and community
structure of network with an auxiliary community representation
matrix
105. M-NMF Model
1. Modeling community structure
• modularity maximization based community detection method is adopted to
model the community structure. [Newman 2006a]
• specifically, given a network A with two communities, the modularity is
defined as follows [Newman 2006b]
• e: # of edges in the network
• = 1 if node I belongs to the first community, otherwise, = -1
• is the expected # of edges between nodes i and j if edges are placed at random
• intuitively, measures the difference between # edges falling within communities and the
expected # in an equivalent network with edges place at random
ih ih
106. M-NMF Model
1. Extend to communities
• define the modularity matrix whose element
• then the modularity function can be reformulated as
where is the community membership indicator
• generalize the community membership indicator as
• then the modularity function can be reformulated as
2k
n
ih h
107. M-NMF Model
2. Modeling microscopic structure
• First-order Proximity (the adjacency matrix A)
• the most direct expression of network
• but the adjacency matrix is usually sparse.
• For two nodes with no edge, it does not imply these two nodes have no similarity
• Second-Order Proximity
• this paper considers the cosine similarity between the first-order proximity vector
• where denotes the first-order proximity between node i and other nodes
• The final similarity matrix
108. M-NMF Model
• The unified network embedding model
• introduce the nonnegative basis matrix and representation matrix
• the first objective is approximate the similarity matrix S
• introduce the community representation matrix
• the second objective is approximate the community indicator matrix H
• the third objective is maximize the modularity function
109. M-NMF Model
• Updating rule
• only the mixing proximity matrix S is given. M, U, C, H need to be iteratively
updated to optimize the overall objective
• separate the objective function to four subproblems and iteratively optimize
them respectively
• convergence of the updating rule is given in section Optimization in the paper
110. Experimental Evaluations
• dataset
• The WebKB network consists of 4 subnetworks with 877 webpages and 1608
edges. Each subnetwork is divided into 5 communities
• Political blog network composed of blogs (1222) about US politics and the
web links (16715) between them. The blogs are divided into 2 communities
• Facebook subnetworks which includes 4 communities corresponding to four
universities
• Baseline
• DeepWalk
• LINE
• GraRep
• Node2Vec
• M-NMF0
111. Experimental Evaluations
• Task 1: Node clustering
• applied K-means to the learned representations of nodes and adopted
accuracy to access the quality of the node clustering results
112. Experimental Evaluations
• Task 2: Node classification
• The learned representations of nodes were used to classify these nodes into a
set of labels. LIBLINEAR package is used to train the classifiers with randomly
selected 80% nodes as the training nodes and the rest as the testing nodes
114. Conclusion
• proposed a novel Modularized Nonnegative Matrix Factorization (M-
NMF) model for network embedding, which preserves both the
microscopic structure (first and second-order proximities) and
mesoscopic community structure.
• derived efficient updating rules to learn the parameters of M-NMF,
and provided the theoretical analysis on their correctness and
convergence.
• M-NMF was extensively evaluated on nine real networks and two
network analysis tasks, which demonstrated its effectiveness and
robustness to the model parameters.
115. Question
• is Euclidean space proper for network embedding ?
• for example, intuitively, increasing the dimension of embedding vectors could
damage the clustering
• is Machine learning a principled method for network embedding ?
• for example, the scale-free and clustering should be understood as emergent
or learning property ?