Multi-label Relational Neighbor Classification using Social Context Features (SCRN

Multi-label Relational Neighbor Classification
using Social Context Features
Xi Wang and Gita Sukthankar
Department of EECS
University of Central Florida

Motivation
 The conventional relational
classification model focuses on
the single-label classification
problem.
 Real-world relational datasets
contain instances associated
with multiple labels.
 Connections between instances
in multi-label networks are
driven by various casual
reasons.
Example: Scientific collaboration network
Machine Learning
Data Mining
Artificial
Intelligence
1

Problem Formulation
 Node classification in multi-relational networks
 Input:
 Network structure (i.e., connectivity information)
 Labels of some actors in the network
 Output:
 Labels of the other actors
2

Classification in Networked Data
 Homophily: nodes with similar labels are more likely to be
connected
 Markov assumption:
 The label of one node depends on that of its immediate neighbors in
the graph
 Relational models are built based on the labels of neighbors.
 Predictions are made using collective inference.
3

Contribution
 A new multi-label iterative relational neighbor classifier
(SCRN)
 Extract social context features using edge clustering to
represent a node’s potential group membership
 Use of social features boosts classification performance
over benchmarks on several real-world collaborative
networked datasets
4

Relational Neighbor Classifier
 The Relational Neighbor (RN) classifier proposed by Macskassy et al.
(MRDM’03), is a simple relational probabilistic model that makes
predictions for a given node based solely on the class labels of its
neighbors.
Iteration 1 Iteration 2Training Graph
5

Relational Neighbor Classifier
 Weighted-vote relational neighbor classifier (wvRN)
estimates prediction probability as:
Here is the usual normalization factor, and
is the weight of the link between node and


ij Nv
jjjiii NcLPvvw
z
vcLP )|(),(
1
)|(
z w(vi,vj )
vi vj
6

Apply RN in Multi-relational Network
Ground truth
: nodes with both labels (red, green)
: nodes with green label only
: nodes with red label only
7

Edge-Based Social Feature Extraction
 Connections in human networks are mainly affiliation-
driven.
 Since each connection can often be regarded as principally
resulting from one affiliation, links possess a strong
correlation with a single affiliation class.
 The edge class information is not readily available in most
social media datasets, but an unsupervised clustering
algorithm can be applied to partition the edges into disjoint
sets (KDD’09,CIKM’09).
8

Cluster edges using K-Means
 Scalable edge clustering method proposed by Tang et al.
(CIKM’09).
 Each edge is represented in a feature-based format, where
each edge is characterized by its adjacent nodes.
 K-means clustering is used to separate the edges into
groups, and the social feature (SF) vector is constructed
based on edge cluster IDs.
Original network Step1 : Edge representations Step2: Construct social features
9

Edge-Clustering Visualization
Figure: A subset of DBLP with 95 instances. Edges are clustered into 10
groups, with each shown in a different color.
10

Proposed Method: SCRN
 The initial set of reference features for class c can be
defined as the weighted sum of social feature vectors for
nodes known to be in class c:
 Then node ’s class propagation probability for class c
conditioned on its social features:
RV(c) =
1
|Vc
K
|
P(li
c
=1)´SF(vi )
viÎVc
K
å
vi
PCP (li
c
| SF(vi ))= sim(SF(vi ), RV(c))
11

SCRN
 SCRN estimates the class-membership probability of node
belonging to class c using the following equation:
P(li
c
| Ni,SF(vi )) =
1
z
PCP (li
c
| SF(vi ))´w(vi,vj )´ P(lj
c
| Nj )
vj ÎNi
å
class propagation probability
similarity between connected nodes
(link weight)
class probability of its neighbors
vi
12

SCRN Overview
Input: , Max_Iter
Output: for nodes in
1. Construct nodes’ social feature space
2. Initialize the class reference vectors for each class
3. Calculate the class-propagation probability for each test
node
4. Repeat until # of iterations > Max_Iter or predictions
converge
 Estimate test node’s class probability
 Update the test node’s class probability in collective inference
 Update the class reference vectors
 Re-calculate each node’s class-propagation probability
{G,V,E,C,LK }
LU VU
13

SCRN Visualization
Figure: SCRN on synthetic multi-label network with 1000 nodes and 32 classes
(15 iterations).
14

Datasets
DBLP
 We construct a weighted collaboration network for
authors who have published at least 2 papers during the
2000 to 2010 time- frame.
 We selected 15 representative conferences in 6 research
areas:
DataBase: ICDE,VLDB, PODS, EDBT
Data Mining: KDD, ICDM, SDM, PAKDD
Artificial Intelligence: IJCAI, AAAI
Information Retrieval: SIGIR, ECIR
Computer Vision: CVPR
Machine Learning: ICML, ECML
15

Datasets
IMDb
 We extract movies and TV shows released between
2000 and 2010, and those directed by the same director
are linked together.
 We only retain movies and TV programs with greater
than 5 links.
 Each movie can be assigned to a subset of 27 different
candidate movie genres in the database such as
“Drama", “Comedy", “Documentary" and “Action”.
16

Datasets
YouTube
 A subset of data (15000 nodes) from the original
YouTube dataset[1] using snowball sampling.
 Each user in YouTube can subscribe to different interest
groups and add other users as his/her contacts.
 Class labels are 47 interest groups.
[1] http://www.public.asu.edu/~ltang9/social_ dimension.html
17

Comparative Methods
Edge (EdgeCluster)
wvRN
Prior
Random
18

Experiment Setting
 Size of social feature space :
 1000 for DBLP and YouTube; 10000 for IMDb
 Class propagation probability is calculated with the
Generalized Histogram Intersection Kernel.
 Relaxation Labeling is used in the collective inference
framework for SCRN and wvRN.
 We assume the number of labels for testing nodes is known.
19

Experiment Setting
 We employ the network cross-validation (NCV) method
(KAIS’11) to reduce the overlap between test samples.
 Classification performance is evaluated based on Micro-F1,
Macro-F1 and Hamming Loss.
20

Results (Micro-F1)
DBLP
10
20
30
40
50
60
70
5 10 15 20 25 30
Micro-F1accuracy(%)
Training data percentage(%)
SCRN
Edge
wvRN
Prior
Random
21

Results (Macro-F1)
DBLP
10
20
30
40
50
60
70
5 10 15 20 25 30
Macro-F1accuracy(%)
Training data percentage (%)
SCRN
Edge
wvRN
Prior
Random
22

Results (Hamming Loss)
DBLP
23

IMDb
24

YouTube
25

Conclusion
 Links in multi-relational networks are heterogeneous.
 SCRN exploits label homophily while simultaneously
leveraging social feature similarity through the introduction
of class propagation probabilities.
 Significantly boosts classification performance on multi-
label collaboration networks.
 Our open-source implementation of SCRN is available at:
http://code.google.com/p/multilabel-classification-on-social-network/
26

Reference
 MACSKASSY, S. A., AND PROVOST, F. A simple relational classifier. In
Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at
KDD, 2003, pp. 64–76.
 TANG, L., AND LIU, H. Relational learning via latent social dimensions. In
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD), 2009, pp. 817–826.
 TANG, L., AND LIU, H. Scalable learning of collective behavior based on sparse
social dimensions. In Proceedings of International Conference on Information and
Knowledge Management (CIKM), 2009, pp. 1107-1116.
 NEVILLE, J., GALLAGHER, B., ELIASSI-RAD, T., AND WANG, T. Correcting
evaluation bias of relational classifiers with network cross validation. Knowledge
and Information Systems (KAIS), 2011, pp. 1–25.
27

Multi-label Relational Neighbor Classification using Social Context Features (SCRN

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (12)

Semelhante a Multi-label Relational Neighbor Classification using Social Context Features (SCRN

Semelhante a Multi-label Relational Neighbor Classification using Social Context Features (SCRN (20)

Último

Último (20)

Multi-label Relational Neighbor Classification using Social Context Features (SCRN

Notas do Editor