Learning Semantic Relationships between Entities in Twitter
Analyzing Cross-System User Modeling on the Social Web
1. Analyzing Cross-System User Modeling on the Social Web ICWE, Cyprus, June 22, 2011 Fabian Abel, SamurAraujo, QiGao, Geert-Jan Houben Web Information Systems, TU Delft
2. What we do: Science and Engineering for the Personal Web domains: news social mediacultural heritage public datae-learning Personalized Recommendations Personalized Search Adaptive Systems Analysis and User Modeling Semantic Enrichment, Linkage and Alignment user/usage data Social Web
3. profile ? Hi, I have a new-user problem! profile Hi, I’m back and I have new interests. Hi, I don’t know that your interests changed! Pitfalls of User-adaptive Systems Hi, I’m your new user. Give me personalization! System A System D System C System B How can we tackle these problems? profile profile profile time
5. SocialGraph API 1. get other accounts of user Account Mapping 2. aggregate public profile data Social Web Aggregator Blog posts: Semantic Enhancement Profile Alignment Bookmarks: 3. Map profiles to target user model 4. enrich data with semantics Other media: WordNet® Social networking profiles: FOAF vCard Interweaving public user data with Mypes Aggregated, enriched profile (e.g., in RDF or vCard) Google Profile URI http://google.com/profile/XY Analysis and user modeling 5. generate user profiles
6. In this paper: User Modeling across Twitter, Flickr and Delicious Twitter and Delicious 1500 users 80k + 620k TAS Flickr and Delicious 1467 users 890k + 680k TAS Bob travel, google IO web socialmedia identity This is #interesting: http://bit.ly/3gt42f #web http://claimid.com Twitter Delicious Flickr
7. Tag-based user profiles Tag-based profile of a user u = set of weighted tags: weight indicates to what degree the user is interested in t tag of interest Lightweight weighting scheme: count how often the user applied the tag
9. Characteristics of tag-based profiles What are the characteristics of the individual tag-based profiles in Twitter, Flickr and Delicious? How do the tag-based profiles of individual users overlap between the different systems?
11. Overlap of tag-based profiles Overlap of tag-based profile is less than 10% for more than 90% of the users
12. where: - p(t) = probability that t occurs in Tu - Tu = tags in user profile P(u) Entropy of Tag-based profiles Delicious Flickr & Delicious Flickr Twitter & Delicious Twitter Aggregated profiles reveal wrt entropy significantly more information than the service specific profiles.
13. Observations Profile size varies from system to system (e.g. tag-based Twitter profiles are rather sparse) Tag-based profiles of an individual user overlap only little(e.g. overlap is less than 10% for more than 90% of the users) Entropy of tag-based profiles: Twitter < Flickr < Delicious < aggregated profiles
15. Evaluation: Recommending tags / bookmarks Hi, I’m your new user. Give me personalization! delicious profile profile ? user’s tags and bookmarks profile Ground truth: leave-n-out evaluation tags to explore Cosine-based recommender Web sites to bookmark Cross-system user modeling actual tags and bookmarks of the user How does cross-system user modeling impact the recommendation quality (in cold-start situations)?
16. User Modeling Building Blocks 1. Which tags should be contained in the profile? 2. Further enrich/align tags? 3. How to weight the tags? 1. Source Profile? tags weights analyze 0.1 0.1 0.5 0.2 0.1 t1 t2 t3 t4 t5 2. Semantic Enrichment enrich 3. Weighting Scheme ? weight System A System B
17. User Modeling Building Blocks (in this talk) Source: Personal tags from foreign system Popular tags from target system Semantic Enrichment: Enrich tags with similar tags (based on Jaro-Winkler similarity) Cross-system rules: if tag A was used in foreign system then add tag B Weighting scheme: Personal usage frequency in foreign system Global usage frquency in target system personal profile popular profile ? similarity cross rules personal global Foreign: Target: a) simJaro(blog, blogs) is high b) Cross-system rule: blogforeignnikontarget web blog java requires profile to compute recommendations blogs france
18. Cross-System User Modeling for Cold-start recommendations Which user modeling strategies performs best in which context? How do the different building blocks of the user modeling strategies (e.g. source of user data) influence the quality of the tag-based profiles?
20. Tag recommendations: Twitter Delicious Significant improvements regarding all metrics! Improvement regarding P@10, but “global Delicious trend” performs better regarding MRR & S@1. Cross-system strategies lead to significant improvement (impact of semantic enrichment is rather low) profile profile profile global tag frequencies (weights) profile ? profile ? user’s tags user profile popular personal personal personal global personal global global baseline Cross-system user modeling similarity
21. Tag recommendations: Delicious Twitter Semantic enrichment (cross-system rules) allow for significant improvement regarding P@10 Significant improvements regarding all metrics! profile profile profile Tag-based profile information from Delicious seems to be more valuable than hashtga-based Twitter profiles user’s tags and tag frequencies (weights) profile ? user profile popular personal personal personal global personal global global baseline Cross-system user modeling crossrules
22. Tag Recommendations: different settings profile profile target: Cross-system user modeling allows for cold-start tag recommendations in Delicious: Twitter profiles are more appropriate than Flickr profiles. Cross-system user modeling is also beneficial for cold-start tag recommendations in Flickr. target: profile ? profile ? Cross-system user modeling has significant impact on the recommendation performance To optimize the performance one adapt to the given application setting profile
23. Bookmark Recommendations Cross-system user modeling achieves also significant improvements for cold-start bookmark recommendations Twitter is again a more appropriate source than Flickr baseline Cross UM Cross UM
24. Conclusions Characteristics of distributed tag-based profiles: Overlap of tag-based profiles, which an individual user creates at different services, is low Aggregated profiles reveal significantly more information (regarding entropy) than service-specific profiles Performance of cross-system user modeling for cold-start recommendations: Cross-system UM leads to tremendous (and significant) improvements of the tag and bookmark recommendation quality To optimize the performance one has to adapt the cross-system strategies to the concrete application setting http://persweb.org
25. Thank you! Fabian Abel, QiGao, Geert-Jan Houben, Ke Tao Datasets: http://wis.ewi.tudelft.nl/icwe2011/um/ Twitter: @persweb http://persweb.org
Notas do Editor
Observations:Even though the size of Flickr profiles is high, the entropy is rather lowEntropy of aggregated profiles is the highest
Source = which tags do we put into the profile?Semantic Enrichment: do we do something further with the tags that are already selected to be in the profile? (here: do we add further tags?)Weighting scheme: how do we weigh the tags (in the paper we compare two dimensions: (i) type of weighting (-> TF vs. TFxIDF) and (ii) where do we count (i.e. do we take the the TF statistics from (a) the personal profile of the foreign system or (b) from the “global statistics” of the target system) In this talk, we do just look at (ii). We use TF and do not report on TFxIDF.
Here, we do “semantic enrichment” based on “Tag-similarity” (see slide 15: User Modeling Building Blocks)
Here, we do “semantic enrichment” based on “cross-system rules” (see slide 15: User Modeling Building Blocks)
Characteristics:Overlap is small; still one gets significantly more informationPerformance: cross-system UM leads to very high improvements for cold-start recommendations (some personal information is better than nothing) to optimize: we need to know the characteristics of the system (we can be stupid and simply aggregate what we can get this is fine as we will get improvements anyhow; but we can massively optimize if we carefully select the different building blocks of the cross-system UM strategy with respect to the given application (e.g. Recommending bookmarks in Delicious select tags from personal Twitter profile, but weigh them according to the global Delicious tag frequencies.