SlideShare uma empresa Scribd logo
1 de 116
Baixar para ler offline
Data Mining over Multilayer Networks
community detection and ranking
孙佩源
2018.02.02
PART 1
Outline
• Overview
• Community Detection over Multilayer Networks
• Ranking over Multilayer Networks
• Conclusion
Overview
• Network structure is ubiquitous (everywhere)
• Relationships between components in the network are inherently multifaceted
Social Network Transportation Network Brain Network
Overview
• Application
• Link Prediction (Recommendation), Cascade Predicting, Community
Detection, Ranking, Classification, etc.
• Challenge
• independent layer topology: node’s centrality, clustering, etc.
• coupling dynamics of each layer: diffusion, synchronization, etc.
• Traditional Methods
• topology: modularity, spectral, etc.
• dynamics: DeepWalk, Hawkes, etc.
Overview
• coarse grained comparison of the two papers:
Paper Id Application Subject Network Properties
 Community Detection Complex Network Dynamics
 Ranking Data Mining Topology
PHYSICAL REVIEW X 5, 011027 (2015)
Identifying Modular Flows on Multilayer Networks Reveals Highly
Overlapping Organization in Interconnected Systems

Author
• Manlio De Domenico
• https://comunelab.fbk.eu/manlio/index.php
• the Head of the Complex Multilayer Networks (CoMuNe)
Research Unit at the Center for Information Technology of
Fondazione Bruno Kessler.
• Representative work:
• Mathematical Formulation of Multi-Layer Networks (PRX2013)
• Random Walks on Multiplex Networks (PNAS 2014)
• Diffusion Geometry Unravels the Emergence of Functional
Clusters in Collective Phenomena (PRL 2017)

Author
• Martin Rosvall
• http://www.tp.umu.se/~rosvall/index.html
• Associate professor at Umeå University(于默奥大学)
• Representative work:
• Maps of random walks on complex networks reveal community
structure (PNAS 2008, citations 2148)

Problem
• conventional method assumes the single type of static link, weighted
and directed at best
• aggregating multiple types of relationships into a single network will
distort both the topology and dynamics on the network
• How to define community in multilayer network ?
Solution: extend Map Equation to multilayer network

An Illustrative Example
A multilayer network with
4 nodes and 3 layers
A random walk moving
between nodes across layers

An Illustrative Example
4 nodes represented in
one layer
trajectory of random walk reveals
two overlapping commuties
Intuitive Definition: module that random walk stays for relatively long time

Map Equation
• describe the random walk path on the network efficiently
•
[Maps of random walks on complex networks reveal community structure PNAS2008]
efficiently means shortest description code
assign frequently visited node shorter code
Huffman Coding

Map Equation
• describe the random walk path on the network efficiently
•
[Maps of random walks on complex networks reveal community structure PNAS2008]
Map Equation
• describe the random walk path by two levels of description:
community -> ‘province’
node -> ‘city’

Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step average description
length of movements
between modules
per step average description
length of movements within
modules
average number of
bits per step to
describe an
random walk

Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step prob. that
random walk switches
modules
lower limit of the average length of
a code to describe a module

Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
teleportation prob. prob. that random
walk jumps to outside
of the module

Map Equation
Problem Formalization:
find a module partition M that minimize the following objective function:
per step prob. that
random walk stays
within module i
lower limit of the average length of
a code to name a node in module i

Flow Dynamics on Multilayer Networks
Random walker switches from node i in layer α to j node β in layer by the following
transition probabilities:
relax the Random Walker to jump to any node in any layer

Multiplex Map Equation
per step average description length of the trajectory of an ergodic random walker
on a multiplex network
• state nodes of a physical node can be assigned to multiplex modules
• if they are assigned to the same module, they are assigned a common code

Synthetic Experiment (1/3)
• synthetic data (LFR benchmark network)
• generate T independent LFR benchmark networks for different modes
• sample L network layers from each of the mode networks
• then the ground truth includes T communities
• Metric (Normalized Mutual Information)
• measure similarity between the result partition and the ground truth

Synthetic Result (2/3)
infomap applied on each
layer separately
infomap on the supra-
adjacency of the multiplex
network
Synthetic Result (3/3)
use r=0.15 thought the analysis Modularity performance as the function of DX
Multiplex: measured as the NMI of state nodes
Average: averaged across network layers

Collaboration Networks
Pierre Auger Collaboration of physicists arXiv of research working on networks
512 node with 12,964 edges 14,488 node with 70,350 edges

Layer Overlapping
heat measures as the fraction of state nodes in different network layers
that are assigned to the same communities
Relax rate effect versus Aggregation
increasing relax rate
random walker jumps more freely
results close to aggregation
overlap decreased

Conclusion
• generalizes straightforwardly the map equation
• favors smaller modules with more overlap
• feasible to weighted directed network


KDD 2014
Inside the Atoms: Ranking on a Network of Networks
Author
• Hanghang Tong
• http://tonghanghang.org/
• Assistant Professor at School of Computing,
Informatics and Decision Systems Engineering, Arizona
State University
• Representative work:
Fast random walk with restart and its applications
(ICDM 2006, citation 672)

Background: Ranking
• Ranking without query
• Rank all nodes based on certain measure,
e.g., PageRank, HITS
• Who are most popular users?
• Ranking with query
• Find top-k most “similar” nodes for a query node based
on certain measure, e.g., Personalized PageRank, SimRank
• Who are potential friends of Jon?

Background: NoN(Network of Networks)
Co-author Networks
Disease Network of Protein
Interaction Networks

Problem: Ranking on NoN
• How to identify the importance of Jon
in Data Mining by considering his
overall contributions in related areas ?

Problem: Query on NoN
• Which Bioinformatics researcher are
most likely to collaborate with Data
Mining research Jon?

Problem Definitions

Problem Definitions

Problem Definitions

CrossRank
Problem 1: Cross Rank
Given: (1) an NoN, and (2) the query
vectors ei (i = 1,…,g);
g: # of main nodes in the NoN
ei : an ni×1 nonnegative vector
ni : # of nodes in domain specific network
Find: ranking vectors ri for the nodes in
the domain specific networks Ai

Regularized Optimization Problem
• is the ranking vector of the domain specific network
• is the degree of main node i in the main network
• is the symmetric normalized adjacency matrix
• is the set of common nodes between and

Regularized Optimization Problem
(1) Within-network Smoothness
for adjacency two node x and y, min the
(2) Query Preference
(3) Cross-network Consistency
min the difference between and

Regularized Optimization Problem

CrossQuery
Problem 2: Cross Query
Given: (1) an NoN, and (2) a query from a
source domain-specific network As ,(3) a
target domain-specific network Ad , and (4)
an integer k;
Find: the top-k most relevant nodes from
the target domain-specific network Ad
w.r.t the query node

CrossQuery
• CrossQuery-Basic
CrossQuery is a special case of CorssRank by
(1) requiring the query node begin from the source domain network
(2) restricting the query results within a target domain network
(3) searching for a set of k most relevant nodes from target domain network

CrossQuery-Fast
• Idea: given source and target domain network of main nodes s and d
respectively, prune less relevant main nodes. Then apply CrossQuery-
Basic on the pruned NoN
ranking score of node v w.r.t a query node q can be recast to:
transition prob. of a
random walk path
prune low prob. path

CrossRank Effectiveness
• Co-Author NoN
• CrossRank Effectiveness

CrossQuery Effectiveness
Experiment Setting
• for a given query author predict future
collaborators for this author in a relevant are
• partition the DBLP dataset into two parts
T1 from 2001 to 2005, T2 from 2006 to 2010
• interested in the DB researchers who have no
DM publications during T1 but collaborate with
DM researchers during T2

CrossQuery Efficiency

Protein Interaction NoN
similarity between 5080 diseases is
retrieved from the OMIM database
include 9998 proteins for 60
human tissues using gene
expression

Candidate Gene Prediction

Candidate Gene Prediction

Conclusion
• Proposed a new data model: NoN
• New Ranking Algorithm: CrossRank
• Efficient Query Algorithm: CrossQuery

Thank you!
Network Embedding:
statistical property preservation
sunpeiyuan
2018.03.02
PART 2
metapath2vec: Scalable Representation Learning for
Heterogeneous Networks
KDD 2017
Author
• Yuxiao Dong
• Research Interests:
• social networks, data mining, and computational social science, with an
emphasis on applying computational models to addressing problems in large-
scale networked systems, such as Microsoft Academic Graph(MAG), online
social media, and mobile communication.
Conventional Network Mining and Learning
Network Embedding for Mining and Learning
Word Embedding in NLP
• Input: a text corpus
• Output: d-dim vector for each word W
{ }D W
, ,
W d
X R d W

 wX
geographically close words – a word and its context words – in a sentence or document
exhibit interrelations in human natural language
Network Embedding
• Input: a network
• Output: , d-dim vector for each node V.
( , )G V E
,
V d
X R d V

 vX
Heterogeneous Network Embedding: Problem
• Input: a heterogeneous information network
• Output: , d-dim vector for each node V.
( , , )G V E T
,
V d
X R d V

 vX
Heterogeneous Network Embedding: Challenges
• How do we effectively preserve the concept of “node-context” among
multiple types of nodes, e.g., authors, papers, & venues in academic
heterogeneous networks?
• Can we directly apply homogeneous network embedding architectures
(skip-gram) to heterogeneous networks?
• It is also difficult for conventional meta-path based methods to model
similarities between nodes without connected meta-paths
for example, one publishes 10 papers all in NIPS and the other has 10 publications all
in ICML; their “APCPA”-based PathSim similarity would be zero
Heterogeneous Network Embedding: Solutions
metapath2vec
metapath2vec: Meta-Path-Based Random Walks
• Goal: to generate paths that are able to capture
both the semantic and structural correlations
between different types of nodes, facilitating the
transformation of heterogeneous network
structures into skip-gram
metapath2vec: Meta-Path-Based Random Walks
• Given a meta-path scheme
• The transition probability at step i is defined as
• Recursive guidance for random walkers, i.e.,
denote the type of
neighborhood of node
1( )i
t tN v 1tV 
metapath2vec: Meta-Path-Based Random Walks
• Given a meta-path scheme (Example)
OAPVPAO
• In a traditional random walk procedure, in the toy example, the
next step of a walker on node a4 transitioned from node CMU
can be all types of nodes surrounding it—a2, a3, a5, p2, p3, and
CMU.
• Under the meta-path scheme ‘OAPVPAO’, for example, the
walker is biased towards paper nodes (P) given its previous step
on an organization node CMU (O), following the semantics of
this meta-path.
metapath2vec
metapath2vec++
metapath2vec++: Heterogeneous Skip-Gram
• softmax in metapath2vec
• softmax in metapath2vec++
• objective function
metapath2vec++
• every sub-procedure is easy to parallelize
• 24-32X speedup by using 40 cores
Network Mining and Learning Paradigm
Experiments
• Heterogeneous Data
• AMiner Academic Network
• 1.7 million authors
• 3 million papers
• 3800+ venues
• 8 research areas
• Baselines
• DeepWalk [KDD’14]
• node2vec [KDD’16]
• LINE [WWW’15]
• PTE [KDD’15]
• Parameters
• #walks: 1000
• walk-length: 100
• #dimensions: 128
• neighborhood size: 7
• Mining Tasks
• node classification
• logistic regression
• node clustering
• k-means
• similarity search
• cosine similarity
Application 1: Multi-Class Node Classification
Application 1: Multi-Class Node Classification
Application 2: Node Clustering
Application 3: Similarity Search
Visualization
Conclusion
• Problem: Heterogeneous Network Embedding
• Models: metapath2vec & metapath2vec++
• The automatic discovery of internal semantic
relationships between different types of nodes in
heterogeneous networks
• Applications: classification, clustering, &
similarity search
Representation Learning for Scale-free Networks
AAAI 2018
Author
• Yang Yang http://yangy.org/
• Assistant Professor, Zhejiang University
“I am an assistant professor at College of Computer Science and Technology,
Zhejiang University. My research focuses on mining deep knowledge from large-
scale social and information networks. I obtained my Ph.D. degree from
Tsinghua University in 2016, advised by Jie Tang and Juanzi Li. During my Ph.D.
career, I have been visiting Cornell University (working with John Hopcroft) in
2012, and University of Leuven (working with Marie-Francine Moens) in 2013. I
also fortunately have Yizhou Sun from UCLA as my external advisor.”
Introduction
• recent network representation learning methods mainly focus on
preserving microscopic structure of network
• scale-free property, one of the most fundamental macroscopic
properties of networks, is largely ignored
• the majority of vertexes connected to a high-degree vertex is of low
degree, and not likely connected to each other
• incorporating scale-free property in network embedding can reflect
and preserve the sparsity of real-world network
• intuitively, the problem is caused by the insufficient volume of latent
space to place node to preserve the scale-free property
Introduction
most traditional network embedding algorithms will
overestimate the number of higher degrees
Introduction
• overestimate the high degrees will
• distort the most fundamental properties of real world network
• reconstruct a denser network than its true topology
• influence various network mining tasks, such as vertex classification and link
prediction
Preliminaries
• Network embedding
• Given an undirected graph , the problem of graph embedding
aims to represent each vertex into a low-dimensional space , i.e.,
learning a function , where is the embedding matrix ,
and network structures can be preserved in .
• Network reconstruction
• reconstruct the network edges based on distances between vertexes in latent
space . The probability of an edge between and defined as
• in practice, a threshold is chosen and an edge will be created if
• this method is denoted as
( , )G V E
iv V k
R
: n k
f V U 
U
U
k n
k
R iv jv
Reconstructing Scale-free Networks
• Intuition
• denote all points fall in the closed ball of radius centered at by
• when the center point is a high-degree vertex, there will be many points in
• intuitively, it will be more likely that the distances between these points are less
than
Reconstructing Scale-free Networks
• Theorem 1 (Sphere Packing)
• Remark
Theorem 1 converts the problem of reconstructing a scale-free network to the
Sphere Packing Problem, which seeks to find the optimal packing of spheres in
high dimensional spaces
Reconstructing Scale-free Networks
• Theorem 2 (Upper and lower bounds for packing density)
• Packing Density. The packing density is the fraction of the space filled by the
spheres making up the packing. Denote the optimal sphere packing density in
as
• The best bound [Cohn and Zhao 2014]
Reconstructing Scale-free Networks
• Theorem 3
• Remark
theorem 3 is the most important contribution of this paper. It gives a
theoretical lower bound of the dimension of latent space to sufficiently embed
the network to preserve the scale-free property
For instance, when , we can get that , which is
enough to keep scale-free holds for most real-world networks.
Proposed approach
• General idea ( degree penalty)
• while preserving first- and second-order proximity, the proximity between
vertexes that have high degrees shall be punished.
• for example, the celebrity may not be familiar or similar to her followers.
• Two followers of the same celebrity may not know each other at all and can
be totally dissimilar
• A mediocre user is more likely to known and to be similar to her followers.
ModelⅠ:DP-Spectral
• DP-Spectral (Degree Penalty based Spectral Embedding)
• define a matrix to indicate the common neighbors of any two vertexes
• define another matrix to incorporate the first-order proximity
• further extend to consider degree penalty
proportional to proximity between nodes and
inversely proportional to vertex degrees
ModelⅠ:DP-Spectral
• DP-Spectral (Degree Penalty based Spectral Embedding)
ModelⅡ:DP-Walker
• Define the probability of the random walk jumping from to
• node j will have a greater chance to be sampled when it has more
common neighbors with node i and has a lower degree
• feed the obtained random walk generated paths into the skip-gram
to learn effective vertex representations for G
iv jv
Experiments
Datasets
• Synthetic: generate a synthetic dataset by the Preferential Attachment model
• Facebook: a subnet of Facebook
• Twitter: a subnet of Twitter
• Coauthor: scientific collaborations between authors. An undirected edge exists
between two authors if they have coauthored at least on paper
• Citation: an academic network within which edges indicate citations
• Mobile: a mobile network provided by PPDai(拍拍贷) within which edges
indicate one of the users has called the other
Experiments
Baseline Methods
• Laplacian Eigenmap(LE) [Belkin and Niyogi 2003]
• DeepWalk [Perozzo, Al-Rfou, and Skiena 2014]
• DP-Spectral
• DP-Walker
Experiments
Tasks
• Network Reconstruction
• evaluate the performance of different algorithms by the correlation
coefficients between the reconstructed degrees and the degrees in the given
network.
• Link Prediction
• feed the learned vector to a linear regression classifier and determine
whether there exists an edge between two nodes
• Vertex Classification
• given a vertex i, define its feature vector as learned representation, and train
a linear regression classifier to determine its label
Network Reconstruction
Link Prediction
• set as optimized value in
previous result
• in most cases, DP-Spectral
obtains the best performance

Vertex Classification
• DP-Spectral achieves the best result for 5 out of 7 labels
• Its stability of the performance can also be observed from the table
Conclusion
• analyze the difficulty and feasibility of reconstructing a scale-free
network based on learned vertex representations in the Euclidean
space, by converting the problem to the Sphere Packing problem.
• propose the degree penalty principle and two implementations to
preserve the scale-free property of networks and improve the
effectiveness of vertex representations.
• validate our proposed principle by conducting extensive experiments
and find that our approach achieves a significant improvement on six
datasets and three tasks compared to several state-of-the-art
baselines.
Community Preserving Network Embedding
AAAI 2017
Author
• Xiao Wang(王啸) https://sites.google.com/site/wangxiaotjucs/
• Currently a postdoc in the Department of Computer Science and
Technology at Tsinghua University working with Professor Shiqiang
Yang.
• current research interests include data mining, machine learning, and
analysis of complex networks. In particular, I am interested in the
network embedding and how to effectively detect and analyze the
communities of complex networks.
Author
• Peng Cui(崔鹏) http://media.cs.tsinghua.edu.cn/~multimedia/cuipeng/
• Associate Professor (Tenured), Lab of Media and Network, Department of
Computer Science and Technology, Tsinghua University
• research interests include social network analysis and social multimedia
computing. In social network analysis, I focus more on computational
modeling of complex user behaviors, including individual user behavior
prediction, social group analysis and information propagation modeling.
In social multimedia computing, I am keen to promote the convergence of
user behavioral modeling and multimedia content analysis, with the
ultimate goal to bridge the gap between multimedia data and user needs.
Introduction
• Recent network embedding methods focus on the microscopic
structure of network, i.e., the pairwise similarity between nodes.
• Some researchers further extend to capture high-order proximity.
• the community structure, one important mesoscopic description of
network structure, is largely ignored.
• It is well recognized that clustering is one of the most prominent
features of networks
Introduction
• The representations of nodes within a community should be more
similar than those belonging to different communities even if they
have weak relationship.
• Pairwise similarities will also be strengthened by the community
structure constraint which can be used to solve data sparsity issue.
M-NMF Model
General Idea
• Modularized Nonnegative-Matrix-Factorization Model
• preserves both the microscopic structure (first and second-order
proximities) and mesoscopic community structure
• these two terms are connected by exploiting the consensus
relationship between the representations of nodes and community
structure of network with an auxiliary community representation
matrix
M-NMF Model
1. Modeling community structure
• modularity maximization based community detection method is adopted to
model the community structure. [Newman 2006a]
• specifically, given a network A with two communities, the modularity is
defined as follows [Newman 2006b]
• e: # of edges in the network
• = 1 if node I belongs to the first community, otherwise, = -1
• is the expected # of edges between nodes i and j if edges are placed at random
• intuitively, measures the difference between # edges falling within communities and the
expected # in an equivalent network with edges place at random
ih ih
M-NMF Model
1. Extend to communities
• define the modularity matrix whose element
• then the modularity function can be reformulated as
where is the community membership indicator
• generalize the community membership indicator as
• then the modularity function can be reformulated as
2k 
  n
ih h 
M-NMF Model
2. Modeling microscopic structure
• First-order Proximity (the adjacency matrix A)
• the most direct expression of network
• but the adjacency matrix is usually sparse.
• For two nodes with no edge, it does not imply these two nodes have no similarity
• Second-Order Proximity
• this paper considers the cosine similarity between the first-order proximity vector
• where denotes the first-order proximity between node i and other nodes
• The final similarity matrix
M-NMF Model
• The unified network embedding model
• introduce the nonnegative basis matrix and representation matrix
• the first objective is approximate the similarity matrix S
• introduce the community representation matrix
• the second objective is approximate the community indicator matrix H
• the third objective is maximize the modularity function
M-NMF Model
• Updating rule
• only the mixing proximity matrix S is given. M, U, C, H need to be iteratively
updated to optimize the overall objective
• separate the objective function to four subproblems and iteratively optimize
them respectively
• convergence of the updating rule is given in section Optimization in the paper
Experimental Evaluations
• dataset
• The WebKB network consists of 4 subnetworks with 877 webpages and 1608
edges. Each subnetwork is divided into 5 communities
• Political blog network composed of blogs (1222) about US politics and the
web links (16715) between them. The blogs are divided into 2 communities
• Facebook subnetworks which includes 4 communities corresponding to four
universities
• Baseline
• DeepWalk
• LINE
• GraRep
• Node2Vec
• M-NMF0
Experimental Evaluations
• Task 1: Node clustering
• applied K-means to the learned representations of nodes and adopted
accuracy to access the quality of the node clustering results
Experimental Evaluations
• Task 2: Node classification
• The learned representations of nodes were used to classify these nodes into a
set of labels. LIBLINEAR package is used to train the classifiers with randomly
selected 80% nodes as the training nodes and the rest as the testing nodes
Experimental Evaluations
• Parameter analysis
• test the effect of parameters and of M-NMF on the real networks 
Conclusion
• proposed a novel Modularized Nonnegative Matrix Factorization (M-
NMF) model for network embedding, which preserves both the
microscopic structure (first and second-order proximities) and
mesoscopic community structure.
• derived efficient updating rules to learn the parameters of M-NMF,
and provided the theoretical analysis on their correctness and
convergence.
• M-NMF was extensively evaluated on nine real networks and two
network analysis tasks, which demonstrated its effectiveness and
robustness to the model parameters.
Question
• is Euclidean space proper for network embedding ?
• for example, intuitively, increasing the dimension of embedding vectors could
damage the clustering
• is Machine learning a principled method for network embedding ?
• for example, the scale-free and clustering should be understood as emergent
or learning property ?
Thanks !
Q&A

Mais conteúdo relacionado

Mais procurados

Online opportunistic routing using Reinforcement learning
Online opportunistic routing using Reinforcement learningOnline opportunistic routing using Reinforcement learning
Online opportunistic routing using Reinforcement learningHarshal Solao
 
An efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingAn efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingambitlick
 
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...ijasuc
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksNatasha Mandal
 
Advanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load BalancingAdvanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load Balancingijdpsjournal
 
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...Harshal Solao
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
An Improved Leader Election Algorithm for Distributed Systems
An Improved Leader Election Algorithm for Distributed SystemsAn Improved Leader Election Algorithm for Distributed Systems
An Improved Leader Election Algorithm for Distributed Systemsijngnjournal
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...butest
 
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA AND CHORD DHTS
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA  AND CHORD DHTS COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA  AND CHORD DHTS
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA AND CHORD DHTS ijp2p
 
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P System
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P SystemPacket Loss and Overlay Size Aware Broadcast in the Kademlia P2P System
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P SystemIDES Editor
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...Xi Wang
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Spacencct
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processingasapteam
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 

Mais procurados (18)

Online opportunistic routing using Reinforcement learning
Online opportunistic routing using Reinforcement learningOnline opportunistic routing using Reinforcement learning
Online opportunistic routing using Reinforcement learning
 
An efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingAn efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharing
 
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...
Exploiting Wireless Networks, through creation of Opportunity Network – Wirel...
 
Node similarity
Node similarityNode similarity
Node similarity
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social Networks
 
Advanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load BalancingAdvanced delay reduction algorithm based on GPS with Load Balancing
Advanced delay reduction algorithm based on GPS with Load Balancing
 
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
WLAN IP and Frame
WLAN IP and FrameWLAN IP and Frame
WLAN IP and Frame
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
An Improved Leader Election Algorithm for Distributed Systems
An Improved Leader Election Algorithm for Distributed SystemsAn Improved Leader Election Algorithm for Distributed Systems
An Improved Leader Election Algorithm for Distributed Systems
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
 
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA AND CHORD DHTS
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA  AND CHORD DHTS COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA  AND CHORD DHTS
COMPARATIVE STUDY OF CAN, PASTRY, KADEMLIA AND CHORD DHTS
 
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P System
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P SystemPacket Loss and Overlay Size Aware Broadcast in the Kademlia P2P System
Packet Loss and Overlay Size Aware Broadcast in the Kademlia P2P System
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 

Semelhante a network mining and representation learning

NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 TutorialAlexander Pico
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...ssuser4b1f48
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...Tristan Penman
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionYongsu Baek
 
Analysis and reactive measures on the blackhole attack
Analysis and reactive measures on the blackhole attackAnalysis and reactive measures on the blackhole attack
Analysis and reactive measures on the blackhole attackJyotiVERMA176
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.pptDanBarcan2
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Transfer reliability and congestion control strategies in opportunistic netwo...
Transfer reliability and congestion control strategies in opportunistic netwo...Transfer reliability and congestion control strategies in opportunistic netwo...
Transfer reliability and congestion control strategies in opportunistic netwo...revathiyadavb
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 

Semelhante a network mining and representation learning (20)

NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
Andrea Sini Thesis
Andrea Sini ThesisAndrea Sini Thesis
Andrea Sini Thesis
 
Manos
ManosManos
Manos
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
Analysis and reactive measures on the blackhole attack
Analysis and reactive measures on the blackhole attackAnalysis and reactive measures on the blackhole attack
Analysis and reactive measures on the blackhole attack
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Transfer reliability and congestion control strategies in opportunistic netwo...
Transfer reliability and congestion control strategies in opportunistic netwo...Transfer reliability and congestion control strategies in opportunistic netwo...
Transfer reliability and congestion control strategies in opportunistic netwo...
 
TopologyPPT.ppt
TopologyPPT.pptTopologyPPT.ppt
TopologyPPT.ppt
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 

Mais de sun peiyuan

基于Gpu的高性能计算
基于Gpu的高性能计算基于Gpu的高性能计算
基于Gpu的高性能计算sun peiyuan
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
A geometric interpretation for growing networks
A geometric interpretation for growing networksA geometric interpretation for growing networks
A geometric interpretation for growing networkssun peiyuan
 
Variational inference
Variational inferenceVariational inference
Variational inferencesun peiyuan
 

Mais de sun peiyuan (8)

基于Gpu的高性能计算
基于Gpu的高性能计算基于Gpu的高性能计算
基于Gpu的高性能计算
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
A geometric interpretation for growing networks
A geometric interpretation for growing networksA geometric interpretation for growing networks
A geometric interpretation for growing networks
 
Dsgld
DsgldDsgld
Dsgld
 
Variational inference
Variational inferenceVariational inference
Variational inference
 
Lda
LdaLda
Lda
 
Manifold
ManifoldManifold
Manifold
 
HMC
HMCHMC
HMC
 

Último

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Último (9)

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

network mining and representation learning

  • 1. Data Mining over Multilayer Networks community detection and ranking 孙佩源 2018.02.02 PART 1
  • 2. Outline • Overview • Community Detection over Multilayer Networks • Ranking over Multilayer Networks • Conclusion
  • 3. Overview • Network structure is ubiquitous (everywhere) • Relationships between components in the network are inherently multifaceted Social Network Transportation Network Brain Network
  • 4. Overview • Application • Link Prediction (Recommendation), Cascade Predicting, Community Detection, Ranking, Classification, etc. • Challenge • independent layer topology: node’s centrality, clustering, etc. • coupling dynamics of each layer: diffusion, synchronization, etc. • Traditional Methods • topology: modularity, spectral, etc. • dynamics: DeepWalk, Hawkes, etc.
  • 5. Overview • coarse grained comparison of the two papers: Paper Id Application Subject Network Properties  Community Detection Complex Network Dynamics  Ranking Data Mining Topology
  • 6. PHYSICAL REVIEW X 5, 011027 (2015) Identifying Modular Flows on Multilayer Networks Reveals Highly Overlapping Organization in Interconnected Systems 
  • 7. Author • Manlio De Domenico • https://comunelab.fbk.eu/manlio/index.php • the Head of the Complex Multilayer Networks (CoMuNe) Research Unit at the Center for Information Technology of Fondazione Bruno Kessler. • Representative work: • Mathematical Formulation of Multi-Layer Networks (PRX2013) • Random Walks on Multiplex Networks (PNAS 2014) • Diffusion Geometry Unravels the Emergence of Functional Clusters in Collective Phenomena (PRL 2017) 
  • 8. Author • Martin Rosvall • http://www.tp.umu.se/~rosvall/index.html • Associate professor at Umeå University(于默奥大学) • Representative work: • Maps of random walks on complex networks reveal community structure (PNAS 2008, citations 2148) 
  • 9. Problem • conventional method assumes the single type of static link, weighted and directed at best • aggregating multiple types of relationships into a single network will distort both the topology and dynamics on the network • How to define community in multilayer network ? Solution: extend Map Equation to multilayer network 
  • 10. An Illustrative Example A multilayer network with 4 nodes and 3 layers A random walk moving between nodes across layers 
  • 11. An Illustrative Example 4 nodes represented in one layer trajectory of random walk reveals two overlapping commuties Intuitive Definition: module that random walk stays for relatively long time 
  • 12. Map Equation • describe the random walk path on the network efficiently • [Maps of random walks on complex networks reveal community structure PNAS2008] efficiently means shortest description code assign frequently visited node shorter code Huffman Coding 
  • 13. Map Equation • describe the random walk path on the network efficiently • [Maps of random walks on complex networks reveal community structure PNAS2008]
  • 14. Map Equation • describe the random walk path by two levels of description: community -> ‘province’ node -> ‘city’ 
  • 15. Map Equation Problem Formalization: find a module partition M that minimize the following objective function: per step average description length of movements between modules per step average description length of movements within modules average number of bits per step to describe an random walk 
  • 16. Map Equation Problem Formalization: find a module partition M that minimize the following objective function: per step prob. that random walk switches modules lower limit of the average length of a code to describe a module 
  • 17. Map Equation Problem Formalization: find a module partition M that minimize the following objective function: teleportation prob. prob. that random walk jumps to outside of the module 
  • 18. Map Equation Problem Formalization: find a module partition M that minimize the following objective function: per step prob. that random walk stays within module i lower limit of the average length of a code to name a node in module i 
  • 19. Flow Dynamics on Multilayer Networks Random walker switches from node i in layer α to j node β in layer by the following transition probabilities: relax the Random Walker to jump to any node in any layer 
  • 20. Multiplex Map Equation per step average description length of the trajectory of an ergodic random walker on a multiplex network • state nodes of a physical node can be assigned to multiplex modules • if they are assigned to the same module, they are assigned a common code 
  • 21. Synthetic Experiment (1/3) • synthetic data (LFR benchmark network) • generate T independent LFR benchmark networks for different modes • sample L network layers from each of the mode networks • then the ground truth includes T communities • Metric (Normalized Mutual Information) • measure similarity between the result partition and the ground truth 
  • 22. Synthetic Result (2/3) infomap applied on each layer separately infomap on the supra- adjacency of the multiplex network
  • 23. Synthetic Result (3/3) use r=0.15 thought the analysis Modularity performance as the function of DX Multiplex: measured as the NMI of state nodes Average: averaged across network layers 
  • 24. Collaboration Networks Pierre Auger Collaboration of physicists arXiv of research working on networks 512 node with 12,964 edges 14,488 node with 70,350 edges 
  • 25. Layer Overlapping heat measures as the fraction of state nodes in different network layers that are assigned to the same communities
  • 26. Relax rate effect versus Aggregation increasing relax rate random walker jumps more freely results close to aggregation overlap decreased 
  • 27. Conclusion • generalizes straightforwardly the map equation • favors smaller modules with more overlap • feasible to weighted directed network 
  • 28.  KDD 2014 Inside the Atoms: Ranking on a Network of Networks
  • 29. Author • Hanghang Tong • http://tonghanghang.org/ • Assistant Professor at School of Computing, Informatics and Decision Systems Engineering, Arizona State University • Representative work: Fast random walk with restart and its applications (ICDM 2006, citation 672) 
  • 30. Background: Ranking • Ranking without query • Rank all nodes based on certain measure, e.g., PageRank, HITS • Who are most popular users? • Ranking with query • Find top-k most “similar” nodes for a query node based on certain measure, e.g., Personalized PageRank, SimRank • Who are potential friends of Jon? 
  • 31. Background: NoN(Network of Networks) Co-author Networks Disease Network of Protein Interaction Networks 
  • 32. Problem: Ranking on NoN • How to identify the importance of Jon in Data Mining by considering his overall contributions in related areas ? 
  • 33. Problem: Query on NoN • Which Bioinformatics researcher are most likely to collaborate with Data Mining research Jon? 
  • 37. CrossRank Problem 1: Cross Rank Given: (1) an NoN, and (2) the query vectors ei (i = 1,…,g); g: # of main nodes in the NoN ei : an ni×1 nonnegative vector ni : # of nodes in domain specific network Find: ranking vectors ri for the nodes in the domain specific networks Ai 
  • 38. Regularized Optimization Problem • is the ranking vector of the domain specific network • is the degree of main node i in the main network • is the symmetric normalized adjacency matrix • is the set of common nodes between and 
  • 39. Regularized Optimization Problem (1) Within-network Smoothness for adjacency two node x and y, min the (2) Query Preference (3) Cross-network Consistency min the difference between and 
  • 41. CrossQuery Problem 2: Cross Query Given: (1) an NoN, and (2) a query from a source domain-specific network As ,(3) a target domain-specific network Ad , and (4) an integer k; Find: the top-k most relevant nodes from the target domain-specific network Ad w.r.t the query node 
  • 42. CrossQuery • CrossQuery-Basic CrossQuery is a special case of CorssRank by (1) requiring the query node begin from the source domain network (2) restricting the query results within a target domain network (3) searching for a set of k most relevant nodes from target domain network 
  • 43. CrossQuery-Fast • Idea: given source and target domain network of main nodes s and d respectively, prune less relevant main nodes. Then apply CrossQuery- Basic on the pruned NoN ranking score of node v w.r.t a query node q can be recast to: transition prob. of a random walk path prune low prob. path 
  • 44. CrossRank Effectiveness • Co-Author NoN • CrossRank Effectiveness 
  • 45. CrossQuery Effectiveness Experiment Setting • for a given query author predict future collaborators for this author in a relevant are • partition the DBLP dataset into two parts T1 from 2001 to 2005, T2 from 2006 to 2010 • interested in the DB researchers who have no DM publications during T1 but collaborate with DM researchers during T2 
  • 47. Protein Interaction NoN similarity between 5080 diseases is retrieved from the OMIM database include 9998 proteins for 60 human tissues using gene expression 
  • 50. Conclusion • Proposed a new data model: NoN • New Ranking Algorithm: CrossRank • Efficient Query Algorithm: CrossQuery 
  • 52. Network Embedding: statistical property preservation sunpeiyuan 2018.03.02 PART 2
  • 53. metapath2vec: Scalable Representation Learning for Heterogeneous Networks KDD 2017
  • 54. Author • Yuxiao Dong • Research Interests: • social networks, data mining, and computational social science, with an emphasis on applying computational models to addressing problems in large- scale networked systems, such as Microsoft Academic Graph(MAG), online social media, and mobile communication.
  • 56. Network Embedding for Mining and Learning
  • 57. Word Embedding in NLP • Input: a text corpus • Output: d-dim vector for each word W { }D W , , W d X R d W   wX geographically close words – a word and its context words – in a sentence or document exhibit interrelations in human natural language
  • 58. Network Embedding • Input: a network • Output: , d-dim vector for each node V. ( , )G V E , V d X R d V   vX
  • 59. Heterogeneous Network Embedding: Problem • Input: a heterogeneous information network • Output: , d-dim vector for each node V. ( , , )G V E T , V d X R d V   vX
  • 60. Heterogeneous Network Embedding: Challenges • How do we effectively preserve the concept of “node-context” among multiple types of nodes, e.g., authors, papers, & venues in academic heterogeneous networks? • Can we directly apply homogeneous network embedding architectures (skip-gram) to heterogeneous networks? • It is also difficult for conventional meta-path based methods to model similarities between nodes without connected meta-paths for example, one publishes 10 papers all in NIPS and the other has 10 publications all in ICML; their “APCPA”-based PathSim similarity would be zero
  • 63. metapath2vec: Meta-Path-Based Random Walks • Goal: to generate paths that are able to capture both the semantic and structural correlations between different types of nodes, facilitating the transformation of heterogeneous network structures into skip-gram
  • 64. metapath2vec: Meta-Path-Based Random Walks • Given a meta-path scheme • The transition probability at step i is defined as • Recursive guidance for random walkers, i.e., denote the type of neighborhood of node 1( )i t tN v 1tV 
  • 65. metapath2vec: Meta-Path-Based Random Walks • Given a meta-path scheme (Example) OAPVPAO • In a traditional random walk procedure, in the toy example, the next step of a walker on node a4 transitioned from node CMU can be all types of nodes surrounding it—a2, a3, a5, p2, p3, and CMU. • Under the meta-path scheme ‘OAPVPAO’, for example, the walker is biased towards paper nodes (P) given its previous step on an organization node CMU (O), following the semantics of this meta-path.
  • 68. metapath2vec++: Heterogeneous Skip-Gram • softmax in metapath2vec • softmax in metapath2vec++ • objective function
  • 69. metapath2vec++ • every sub-procedure is easy to parallelize • 24-32X speedup by using 40 cores
  • 70. Network Mining and Learning Paradigm
  • 71. Experiments • Heterogeneous Data • AMiner Academic Network • 1.7 million authors • 3 million papers • 3800+ venues • 8 research areas • Baselines • DeepWalk [KDD’14] • node2vec [KDD’16] • LINE [WWW’15] • PTE [KDD’15] • Parameters • #walks: 1000 • walk-length: 100 • #dimensions: 128 • neighborhood size: 7 • Mining Tasks • node classification • logistic regression • node clustering • k-means • similarity search • cosine similarity
  • 72. Application 1: Multi-Class Node Classification
  • 73. Application 1: Multi-Class Node Classification
  • 74. Application 2: Node Clustering
  • 77. Conclusion • Problem: Heterogeneous Network Embedding • Models: metapath2vec & metapath2vec++ • The automatic discovery of internal semantic relationships between different types of nodes in heterogeneous networks • Applications: classification, clustering, & similarity search
  • 78. Representation Learning for Scale-free Networks AAAI 2018
  • 79. Author • Yang Yang http://yangy.org/ • Assistant Professor, Zhejiang University “I am an assistant professor at College of Computer Science and Technology, Zhejiang University. My research focuses on mining deep knowledge from large- scale social and information networks. I obtained my Ph.D. degree from Tsinghua University in 2016, advised by Jie Tang and Juanzi Li. During my Ph.D. career, I have been visiting Cornell University (working with John Hopcroft) in 2012, and University of Leuven (working with Marie-Francine Moens) in 2013. I also fortunately have Yizhou Sun from UCLA as my external advisor.”
  • 80. Introduction • recent network representation learning methods mainly focus on preserving microscopic structure of network • scale-free property, one of the most fundamental macroscopic properties of networks, is largely ignored • the majority of vertexes connected to a high-degree vertex is of low degree, and not likely connected to each other • incorporating scale-free property in network embedding can reflect and preserve the sparsity of real-world network • intuitively, the problem is caused by the insufficient volume of latent space to place node to preserve the scale-free property
  • 81. Introduction most traditional network embedding algorithms will overestimate the number of higher degrees
  • 82. Introduction • overestimate the high degrees will • distort the most fundamental properties of real world network • reconstruct a denser network than its true topology • influence various network mining tasks, such as vertex classification and link prediction
  • 83. Preliminaries • Network embedding • Given an undirected graph , the problem of graph embedding aims to represent each vertex into a low-dimensional space , i.e., learning a function , where is the embedding matrix , and network structures can be preserved in . • Network reconstruction • reconstruct the network edges based on distances between vertexes in latent space . The probability of an edge between and defined as • in practice, a threshold is chosen and an edge will be created if • this method is denoted as ( , )G V E iv V k R : n k f V U  U U k n k R iv jv
  • 84. Reconstructing Scale-free Networks • Intuition • denote all points fall in the closed ball of radius centered at by • when the center point is a high-degree vertex, there will be many points in • intuitively, it will be more likely that the distances between these points are less than
  • 85. Reconstructing Scale-free Networks • Theorem 1 (Sphere Packing) • Remark Theorem 1 converts the problem of reconstructing a scale-free network to the Sphere Packing Problem, which seeks to find the optimal packing of spheres in high dimensional spaces
  • 86. Reconstructing Scale-free Networks • Theorem 2 (Upper and lower bounds for packing density) • Packing Density. The packing density is the fraction of the space filled by the spheres making up the packing. Denote the optimal sphere packing density in as • The best bound [Cohn and Zhao 2014]
  • 87. Reconstructing Scale-free Networks • Theorem 3 • Remark theorem 3 is the most important contribution of this paper. It gives a theoretical lower bound of the dimension of latent space to sufficiently embed the network to preserve the scale-free property For instance, when , we can get that , which is enough to keep scale-free holds for most real-world networks.
  • 88. Proposed approach • General idea ( degree penalty) • while preserving first- and second-order proximity, the proximity between vertexes that have high degrees shall be punished. • for example, the celebrity may not be familiar or similar to her followers. • Two followers of the same celebrity may not know each other at all and can be totally dissimilar • A mediocre user is more likely to known and to be similar to her followers.
  • 89. ModelⅠ:DP-Spectral • DP-Spectral (Degree Penalty based Spectral Embedding) • define a matrix to indicate the common neighbors of any two vertexes • define another matrix to incorporate the first-order proximity • further extend to consider degree penalty proportional to proximity between nodes and inversely proportional to vertex degrees
  • 90. ModelⅠ:DP-Spectral • DP-Spectral (Degree Penalty based Spectral Embedding)
  • 91. ModelⅡ:DP-Walker • Define the probability of the random walk jumping from to • node j will have a greater chance to be sampled when it has more common neighbors with node i and has a lower degree • feed the obtained random walk generated paths into the skip-gram to learn effective vertex representations for G iv jv
  • 92. Experiments Datasets • Synthetic: generate a synthetic dataset by the Preferential Attachment model • Facebook: a subnet of Facebook • Twitter: a subnet of Twitter • Coauthor: scientific collaborations between authors. An undirected edge exists between two authors if they have coauthored at least on paper • Citation: an academic network within which edges indicate citations • Mobile: a mobile network provided by PPDai(拍拍贷) within which edges indicate one of the users has called the other
  • 93. Experiments Baseline Methods • Laplacian Eigenmap(LE) [Belkin and Niyogi 2003] • DeepWalk [Perozzo, Al-Rfou, and Skiena 2014] • DP-Spectral • DP-Walker
  • 94. Experiments Tasks • Network Reconstruction • evaluate the performance of different algorithms by the correlation coefficients between the reconstructed degrees and the degrees in the given network. • Link Prediction • feed the learned vector to a linear regression classifier and determine whether there exists an edge between two nodes • Vertex Classification • given a vertex i, define its feature vector as learned representation, and train a linear regression classifier to determine its label
  • 96. Link Prediction • set as optimized value in previous result • in most cases, DP-Spectral obtains the best performance 
  • 97. Vertex Classification • DP-Spectral achieves the best result for 5 out of 7 labels • Its stability of the performance can also be observed from the table
  • 98. Conclusion • analyze the difficulty and feasibility of reconstructing a scale-free network based on learned vertex representations in the Euclidean space, by converting the problem to the Sphere Packing problem. • propose the degree penalty principle and two implementations to preserve the scale-free property of networks and improve the effectiveness of vertex representations. • validate our proposed principle by conducting extensive experiments and find that our approach achieves a significant improvement on six datasets and three tasks compared to several state-of-the-art baselines.
  • 99. Community Preserving Network Embedding AAAI 2017
  • 100. Author • Xiao Wang(王啸) https://sites.google.com/site/wangxiaotjucs/ • Currently a postdoc in the Department of Computer Science and Technology at Tsinghua University working with Professor Shiqiang Yang. • current research interests include data mining, machine learning, and analysis of complex networks. In particular, I am interested in the network embedding and how to effectively detect and analyze the communities of complex networks.
  • 101. Author • Peng Cui(崔鹏) http://media.cs.tsinghua.edu.cn/~multimedia/cuipeng/ • Associate Professor (Tenured), Lab of Media and Network, Department of Computer Science and Technology, Tsinghua University • research interests include social network analysis and social multimedia computing. In social network analysis, I focus more on computational modeling of complex user behaviors, including individual user behavior prediction, social group analysis and information propagation modeling. In social multimedia computing, I am keen to promote the convergence of user behavioral modeling and multimedia content analysis, with the ultimate goal to bridge the gap between multimedia data and user needs.
  • 102. Introduction • Recent network embedding methods focus on the microscopic structure of network, i.e., the pairwise similarity between nodes. • Some researchers further extend to capture high-order proximity. • the community structure, one important mesoscopic description of network structure, is largely ignored. • It is well recognized that clustering is one of the most prominent features of networks
  • 103. Introduction • The representations of nodes within a community should be more similar than those belonging to different communities even if they have weak relationship. • Pairwise similarities will also be strengthened by the community structure constraint which can be used to solve data sparsity issue.
  • 104. M-NMF Model General Idea • Modularized Nonnegative-Matrix-Factorization Model • preserves both the microscopic structure (first and second-order proximities) and mesoscopic community structure • these two terms are connected by exploiting the consensus relationship between the representations of nodes and community structure of network with an auxiliary community representation matrix
  • 105. M-NMF Model 1. Modeling community structure • modularity maximization based community detection method is adopted to model the community structure. [Newman 2006a] • specifically, given a network A with two communities, the modularity is defined as follows [Newman 2006b] • e: # of edges in the network • = 1 if node I belongs to the first community, otherwise, = -1 • is the expected # of edges between nodes i and j if edges are placed at random • intuitively, measures the difference between # edges falling within communities and the expected # in an equivalent network with edges place at random ih ih
  • 106. M-NMF Model 1. Extend to communities • define the modularity matrix whose element • then the modularity function can be reformulated as where is the community membership indicator • generalize the community membership indicator as • then the modularity function can be reformulated as 2k    n ih h 
  • 107. M-NMF Model 2. Modeling microscopic structure • First-order Proximity (the adjacency matrix A) • the most direct expression of network • but the adjacency matrix is usually sparse. • For two nodes with no edge, it does not imply these two nodes have no similarity • Second-Order Proximity • this paper considers the cosine similarity between the first-order proximity vector • where denotes the first-order proximity between node i and other nodes • The final similarity matrix
  • 108. M-NMF Model • The unified network embedding model • introduce the nonnegative basis matrix and representation matrix • the first objective is approximate the similarity matrix S • introduce the community representation matrix • the second objective is approximate the community indicator matrix H • the third objective is maximize the modularity function
  • 109. M-NMF Model • Updating rule • only the mixing proximity matrix S is given. M, U, C, H need to be iteratively updated to optimize the overall objective • separate the objective function to four subproblems and iteratively optimize them respectively • convergence of the updating rule is given in section Optimization in the paper
  • 110. Experimental Evaluations • dataset • The WebKB network consists of 4 subnetworks with 877 webpages and 1608 edges. Each subnetwork is divided into 5 communities • Political blog network composed of blogs (1222) about US politics and the web links (16715) between them. The blogs are divided into 2 communities • Facebook subnetworks which includes 4 communities corresponding to four universities • Baseline • DeepWalk • LINE • GraRep • Node2Vec • M-NMF0
  • 111. Experimental Evaluations • Task 1: Node clustering • applied K-means to the learned representations of nodes and adopted accuracy to access the quality of the node clustering results
  • 112. Experimental Evaluations • Task 2: Node classification • The learned representations of nodes were used to classify these nodes into a set of labels. LIBLINEAR package is used to train the classifiers with randomly selected 80% nodes as the training nodes and the rest as the testing nodes
  • 113. Experimental Evaluations • Parameter analysis • test the effect of parameters and of M-NMF on the real networks 
  • 114. Conclusion • proposed a novel Modularized Nonnegative Matrix Factorization (M- NMF) model for network embedding, which preserves both the microscopic structure (first and second-order proximities) and mesoscopic community structure. • derived efficient updating rules to learn the parameters of M-NMF, and provided the theoretical analysis on their correctness and convergence. • M-NMF was extensively evaluated on nine real networks and two network analysis tasks, which demonstrated its effectiveness and robustness to the model parameters.
  • 115. Question • is Euclidean space proper for network embedding ? • for example, intuitively, increasing the dimension of embedding vectors could damage the clustering • is Machine learning a principled method for network embedding ? • for example, the scale-free and clustering should be understood as emergent or learning property ?