Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Exploring Generative Models of Tripartite
Graphs for Recommendation in Social Media
Charalampos Chelmis, Viktor K Prasanna
chelmis@usc.edu
MSM 2013, Paris, France

• Introduction
• Structure of Tripartite Graphs
• Generative Models of Tripartite Graphs
• Social Link Classification Schemes
• Evaluation
• Conclusion
Overview
2

• Social Networking is used for
 Content organization
 Content sharing
• Multiple media types
• Users' activities
 Reveal interests and tastes
 Hidden structure
• Description of Resources
 Text
 Tags / Hashtags
• Social Annotation
 Collective characterization of resources
 Use of synonyms for similar recourses
 Same keywords for different recourses
Introduction
3

• How to address issues of synonymy and polysemy?
 Deal with space size explosion
• How to discover emergent structure in online tagging systems?
 Hidden topics
• How to capture users’ latent interests?
 Which subjects a user is mostly interested in?
 Which users have similar interests?
• How to model the process of social generation of annotations?
 How to capture the semantics of collaboration
• Why is this useful?
 Recommend people
 Recommend Tags / resources
 Clustering
 …
Research Questions
4

• Set of actors (e.g. users) A={a1, ...,ak}
• Set of concepts (e.g. tags) C = {c1, ..., cl}
• Set of resources (e.g. photos) R ={r1, ..., rm}
Structure of Tripartite Graphs
5

• The User-Concept Model
 Users are modeled based on their tag usage
 φ denotes the matrix of topic distributions
− multinomial distribution over N concepts
− T topics being drawn independently
 θ: the matrix of user-specific mixture weights for
these T topics
• Captures users’ latent interests
• Ignores Resources
• Ignores the social aspect of tagging
• The User-Resource Model
 Resources become vocabulary terms
• Tags are ignored
• Ignores the social aspect of tagging
Reducing the Tripartite Graph to Bipartite Structures
6

• Topic-based representation
• Model both resources & users’ interests
• Multiple users may annotate resource r
• For each tag a user is chosen uniformly at random
• Each user is associated with a distribution over
latent topics ɵ
• A topic is chosen from a distribution over topics
specific to that user
• The tag is generated from the chosen topic
 φt: probability distribution of tags for topic t
The User-Resource-Concept Model
7

• Tag Recommendation
 Automatic annotation enhancement
 Search improvement
• Clustering
 Community detection
 Organization of resources/tags in categories
• Navigation and Visualization
 Social browsing
• Next we focus on recommending people
Recommendation
8

• Classification Based on Latent Interests
 Measure “tastes” distance with respect to latent topics distribution
 Pointwise squared distance between feature vectors of users u and v

 Other measures to consider
− Kullback Leibler (KL) divergence
− Cosine similarity
• Objective:
 Minimize the distance between linked users
• Focus on topical homophily
 Ignore network effects
• Prior work uses network proximity as indicator of link formation
Social Link Recommendation Using
Latent Semantics & Network Structure
9
]v))(k,-u)(k,(,,v))(1,-u)(1,[(v)F(u, 22
ΘΘΘΘ= 
F(u,v) = 0 => u,v have
identical distributions
F(u,v) > 0 => distributions
diverge

• Latent Topics & Local Structure
 CN(u,v) = common neighbors between users u and v
− Simplicity and computational efficiency
 Latent topics similarity


• Latent Topics & Global Structure
 SD(u,v) = shortest distance between users u and v

• Non separable training set => inefficient classifiers
• Aggregation Strategy
 Reduce the number of training samples
 Produce more efficient classifiers
 Average latent similarity of user pairs with k common
neighbors:
Social Link Recommendation Using
Latent Semantics & Network Structure
10
v)]CN(u,v),(u,[v)F(u, σ=
∑==
=
kk:pp p
(p)
|kk:p|
1
(k)avg σσ
v)]SD(u,v),(u,[v)F(u, σ=
22
),(),(
),(),(
),(
∑∑
∑
ΘΘ
ΘΘ
=
tt
t
vtut
vtut
vuσ

• Objectives
 Ability to uncover subliminal collective knowledge
 Evaluate performance of “people” recommendation
• Setting
 2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7
• Real-world Dataset
 Last.fm online music system
− social relationships
− tagging information
− music artist listening information
 Statistics
− 1,892 users
− 25,434 directed user friend relations
− 17,632 artists UR Model vocabulary size
− 92,834 user-listened-artist relations
− 11,946 unique tags UC and URC vocabulary size
− 186,479 annotations (tuples <user, tag, artist>)
Experimental Analysis
11

• Evaluate ability to predict tags/resources on new users
 Perplexity
• Split dataset into two disjoint sets
 90% for training
• Lower perplexity indicates better generalization
• URC better overall
 Exploits more information
• UC
 Organizes tags in “clusters”
• UR
 Inferior quality due to noise
Predictive Power
13

• Split dataset into two disjoint sets
 10%, 25%, 50%, 75% for training, rest for testing
• Evaluation process
 Randomly sample 12,716 pairs of users
 50% true links, 50% negative samples
 Compute similarity of user pairs
 Sort users in decreasing order of similarity
 Add links between users with highest similarity
Recommendation of Social Ties
14

• Latent Topics & Shortest Distance
 Aggregates all true links training similarity values in a single point
 Least effective
• Ensemble achieves best precision
• Over fitting for training size > 50%
• Recall drops as dataset size increases
Recommendation of Social Ties
15
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]

• In social media number of true links << absent links
• High performance for both classes
 True negatives easier to classify correctly
 Degradation in performance for true positives
• Reasonable results for practical purposes
How about High Class Imbalance?
16
[Latent Topics]
[Ensemble]

• Baselines
 Cosine Similarity (CS)
 Maximal Information Path (MIP)
• Evaluation Criterion
 Area under the receiver-operating characteristic curve (AUC)
• Baselines AUC
 Computed over the complete dataset
 Biases the evaluation in favor of the baselines
 CS AUC = 0.6087
 MIP AUC = 0.6256
• Same evaluation process as before
• Compute performance lift
 % change over best performing baseline
 Positive % denotes improvement
Comparison to Tag-based similarity metrics
17

• Not all schemes can beat the baseline
 For 10% training data
 ≤10% AUC loss
 But, significant speedup due to minimal training dataset
• Latent Topics & Local Structure Scheme consistently better
Comparison to Tag-based similarity metrics
18
Training dataset size
[Latent Topics]

• Three generative models of tripartite graphs in social tagging
systems
• Modeling of users’ interests in a latent space over resources and
metadata
• Limitations
 Ignore several aspects of real-world annotation process, such as topic
correlation and user interaction
• Achieve great performance in the recommendation task
 Accurate predictors of social ties in conjunction with structural
evidence
 Proposed aggregation strategy to reduce number of training samples
• Future work
 Incorporate other types of resources
 Automatically identify most discriminative latent topics and discard
uninformative resources and metadata
Concluding Remarks
19

• Questions?
chelmis@usc.edu
Thank you!
20

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Semelhante a Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media (20)

Último

Último (20)

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media