This document discusses generative models for tripartite graphs in social media that model users, resources, and tags. It presents three models:
1) The User-Concept model that models users based on their tag usage but ignores resources and social aspects.
2) The User-Resource model that models resources as vocabulary terms but ignores tags and social aspects.
3) The User-Resource-Concept model that models both resources and users' interests using a topic-based representation and models the social generation of annotations.
The models are evaluated on their ability to predict tags/resources for new users, recommend social ties, and compare to baseline similarity metrics, with the ensemble approach achieving the best performance.
Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media
1. Exploring Generative Models of Tripartite
Graphs for Recommendation in Social Media
Charalampos Chelmis, Viktor K Prasanna
chelmis@usc.edu
MSM 2013, Paris, France
2. • Introduction
• Structure of Tripartite Graphs
• Generative Models of Tripartite Graphs
• Social Link Classification Schemes
• Evaluation
• Conclusion
Overview
2
3. • Social Networking is used for
Content organization
Content sharing
• Multiple media types
• Users' activities
Reveal interests and tastes
Hidden structure
• Description of Resources
Text
Tags / Hashtags
• Social Annotation
Collective characterization of resources
Use of synonyms for similar recourses
Same keywords for different recourses
Introduction
3
4. • How to address issues of synonymy and polysemy?
Deal with space size explosion
• How to discover emergent structure in online tagging systems?
Hidden topics
• How to capture users’ latent interests?
Which subjects a user is mostly interested in?
Which users have similar interests?
• How to model the process of social generation of annotations?
How to capture the semantics of collaboration
• Why is this useful?
Recommend people
Recommend Tags / resources
Clustering
…
Research Questions
4
5. • Set of actors (e.g. users) A={a1, ...,ak}
• Set of concepts (e.g. tags) C = {c1, ..., cl}
• Set of resources (e.g. photos) R ={r1, ..., rm}
Structure of Tripartite Graphs
5
6. • The User-Concept Model
Users are modeled based on their tag usage
φ denotes the matrix of topic distributions
− multinomial distribution over N concepts
− T topics being drawn independently
θ: the matrix of user-specific mixture weights for
these T topics
• Captures users’ latent interests
• Ignores Resources
• Ignores the social aspect of tagging
• The User-Resource Model
Resources become vocabulary terms
• Tags are ignored
• Ignores the social aspect of tagging
Reducing the Tripartite Graph to Bipartite Structures
6
7. • Topic-based representation
• Model both resources & users’ interests
• Multiple users may annotate resource r
• For each tag a user is chosen uniformly at random
• Each user is associated with a distribution over
latent topics ɵ
• A topic is chosen from a distribution over topics
specific to that user
• The tag is generated from the chosen topic
φt: probability distribution of tags for topic t
The User-Resource-Concept Model
7
8. • Tag Recommendation
Automatic annotation enhancement
Search improvement
• Clustering
Community detection
Organization of resources/tags in categories
• Navigation and Visualization
Social browsing
• Next we focus on recommending people
Recommendation
8
9. • Classification Based on Latent Interests
Measure “tastes” distance with respect to latent topics distribution
Pointwise squared distance between feature vectors of users u and v
Other measures to consider
− Kullback Leibler (KL) divergence
− Cosine similarity
• Objective:
Minimize the distance between linked users
• Focus on topical homophily
Ignore network effects
• Prior work uses network proximity as indicator of link formation
Social Link Recommendation Using
Latent Semantics & Network Structure
9
]v))(k,-u)(k,(,,v))(1,-u)(1,[(v)F(u, 22
ΘΘΘΘ=
F(u,v) = 0 => u,v have
identical distributions
F(u,v) > 0 => distributions
diverge
10. • Latent Topics & Local Structure
CN(u,v) = common neighbors between users u and v
− Simplicity and computational efficiency
Latent topics similarity
• Latent Topics & Global Structure
SD(u,v) = shortest distance between users u and v
• Non separable training set => inefficient classifiers
• Aggregation Strategy
Reduce the number of training samples
Produce more efficient classifiers
Average latent similarity of user pairs with k common
neighbors:
Social Link Recommendation Using
Latent Semantics & Network Structure
10
v)]CN(u,v),(u,[v)F(u, σ=
∑==
=
kk:pp p
(p)
|kk:p|
1
(k)avg σσ
v)]SD(u,v),(u,[v)F(u, σ=
22
),(),(
),(),(
),(
∑∑
∑
ΘΘ
ΘΘ
=
tt
t
vtut
vtut
vuσ
11. • Objectives
Ability to uncover subliminal collective knowledge
Evaluate performance of “people” recommendation
• Setting
2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7
• Real-world Dataset
Last.fm online music system
− social relationships
− tagging information
− music artist listening information
Statistics
− 1,892 users
− 25,434 directed user friend relations
− 17,632 artists UR Model vocabulary size
− 92,834 user-listened-artist relations
− 11,946 unique tags UC and URC vocabulary size
− 186,479 annotations (tuples <user, tag, artist>)
Experimental Analysis
11
13. • Evaluate ability to predict tags/resources on new users
Perplexity
• Split dataset into two disjoint sets
90% for training
• Lower perplexity indicates better generalization
• URC better overall
Exploits more information
• UC
Organizes tags in “clusters”
• UR
Inferior quality due to noise
Predictive Power
13
14. • Split dataset into two disjoint sets
10%, 25%, 50%, 75% for training, rest for testing
• Evaluation process
Randomly sample 12,716 pairs of users
50% true links, 50% negative samples
Compute similarity of user pairs
Sort users in decreasing order of similarity
Add links between users with highest similarity
Recommendation of Social Ties
14
15. • Latent Topics & Shortest Distance
Aggregates all true links training similarity values in a single point
Least effective
• Ensemble achieves best precision
• Over fitting for training size > 50%
• Recall drops as dataset size increases
Recommendation of Social Ties
15
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]
16. • In social media number of true links << absent links
• High performance for both classes
True negatives easier to classify correctly
Degradation in performance for true positives
• Reasonable results for practical purposes
How about High Class Imbalance?
16
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]
17. • Baselines
Cosine Similarity (CS)
Maximal Information Path (MIP)
• Evaluation Criterion
Area under the receiver-operating characteristic curve (AUC)
• Baselines AUC
Computed over the complete dataset
Biases the evaluation in favor of the baselines
CS AUC = 0.6087
MIP AUC = 0.6256
• Same evaluation process as before
• Compute performance lift
% change over best performing baseline
Positive % denotes improvement
Comparison to Tag-based similarity metrics
17
18. • Not all schemes can beat the baseline
For 10% training data
≤10% AUC loss
But, significant speedup due to minimal training dataset
• Latent Topics & Local Structure Scheme consistently better
Comparison to Tag-based similarity metrics
18
Training dataset size
[Latent Topics & Local Structure]
[Latent Topics]
19. • Three generative models of tripartite graphs in social tagging
systems
• Modeling of users’ interests in a latent space over resources and
metadata
• Limitations
Ignore several aspects of real-world annotation process, such as topic
correlation and user interaction
• Achieve great performance in the recommendation task
Accurate predictors of social ties in conjunction with structural
evidence
Proposed aggregation strategy to reduce number of training samples
• Future work
Incorporate other types of resources
Automatically identify most discriminative latent topics and discard
uninformative resources and metadata
Concluding Remarks
19