O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data

485 visualizações

Publicada em

Talk @ESWC 2017 - Cataldo Musto - University of Bari- RecSys

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data

  1. 1. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) ESWC 2017 14th Extended Semantic Web Conference Portoroz (Slovenia) June 1, 2017
  2. 2. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommender Systems Technology able to push relevant items (movies, news, books, etc.) to the user according to her preferences. 2
  3. 3. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommender Systems Largely adopted in industry 3
  4. 4. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation Paradigms Collaborative Filtering Content-based RecSys 4
  5. 5. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation Paradigms Collaborative Filtering Exploits the preferences of the community to generate recommendations. Insight: to suggest items liked by users similar to the target one 5
  6. 6. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation Paradigms Content-based RecSys Exploit descriptive features of the items (e.g. genre of a book, director of a movie) to generate recommendations. Insight: to suggest items similar to those the user already liked 6
  7. 7. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation Paradigms + Hybrid Recommender Systems Combine different recommendation paradigms to provide recommendations. Advantage: to merge the strength of each paradigm in a unique representation 7
  8. 8. 8 Graph-based RecSys Focus of this work. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  9. 9. 9 Graph-based RecSys They can combine collaborative (user preferences) and content-based features in a unique and powerful formalism Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  10. 10. 10 Graph-based RecSys How to model collaborative and content-based data features as a graph? Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  11. 11. 11 i4 Users = nodes Items = nodes Preferences = edges Collaborative data model for Graph-based RecSys u1 i1 u2 i2 u3 i3 u4 i4 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  12. 12. 12 i4 Users = nodes Items = nodes Preferences = edges What about content-based features? u1 i1 u2 i2 u3 i3 u4 i4 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  13. 13. 13 i4 Users = nodes Items = nodes Preferences = edges We need a data source to feed our items with descriptive features u1 i1 u2 i2 u3 i3 u4 i4 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  14. 14. 14 Linked Open Data (cloud) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 http://dbpedia.org Our first contribution: we introduce DBpedia in a hybrid graph-based representation
  15. 15. 15 Wikipedia unstructured content example Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 (Wikipedia page)
  16. 16. 16 example (Wikipedia page) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Wikipedia unstructured content
  17. 17. 17 DBpedia Structured Representation The Matrix Don Davis http://dbpedia.org/resource/Category:Films_shot_in_Australia Films shot in Australia dcterms:subject dbpedia-owl:m usicCom poser http://dbpedia.org/resource/Don_Davis_(composer) dcterms:subject dcterm s:subject dbo:runtimedbpedia-owl:director dcterm s:subject dcterm s:subject Dystopian Films136 American Action Thriller Films Cyberpunk Films The Wachowskis http://dbpedia.org/resource/The_Wachowskis http://dbpedia.org/resource/Dystopian_FIlms http://dbpedia.org/resource/Cyberpunk_Films http://dbpedia.org/resource/American_Action_Thriller_FIlms http://dbpedia.org/resource/Films_About_Rebellions Films about Rebellions Several interesting (non-trivial) features come into play! Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  18. 18. 18 The Matrix Don Davis http://dbpedia.org/resource/Category:Films_shot_in_Australia Films shot in Australia dcterms:subject dbpedia-owl:m usicCom poser http://dbpedia.org/resource/Don_Davis_(composer) dcterms:subject dcterm s:subject dbo:runtimedbpedia-owl:director dcterm s:subject dcterm s:subject Dystopian Films136 American Action Thriller Films Cyberpunk Films The Wachowskis http://dbpedia.org/resource/The_Wachowskis http://dbpedia.org/resource/Dystopian_FIlms http://dbpedia.org/resource/Cyberpunk_Films http://dbpedia.org/resource/American_Action_Thriller_FIlms http://dbpedia.org/resource/Films_About_Rebellions Films about Rebellions Several interesting (non-trivial) features come into play! Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 DBpedia Structured Representation
  19. 19. 19 Linked Open Data (cloud) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 How to introduce DBpedia data points in our graph-based representation?
  20. 20. 20 Linked Open Data (cloud) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Key concept: mapping
  21. 21. 21 i4 u1 u2 u3 u4 Introducing Linked Open Data graph-based data model - bipartite representation Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  22. 22. 22 i4 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Introducing Linked Open Data graph-based data model - DBpedia mapping i4 u1 u2 u3 u4
  23. 23. 23 i4 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Introducing Linked Open Data graph-based data model - DBpedia mapping i4 u1 u2 u3 u4 dbr:Django_Unchained dbr:Kill_Bill dbr:Eyes_Wide_Shut dbr:The_Matrix
  24. 24. 24 i4 u1 u2 u3 u4 dcterms:subject http://dbpedia.org/resource/Films_About_Rebellions Films about Rebellions dbprop:director Quentin Tarantino dbprop:director graph-based data model - LOD-boosted representation 1999 films http://dbpedia.org/resource/1999_films dcterms:subject dcterms:subject Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Introducing Linked Open Data http://dbpedia.org/resource/Quentin_Tarantino
  25. 25. 25Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Contribution 1: Tripartite Graph-based Representation encoding user preferences and descriptive features gathered from the LOD cloud Graph-based RecSys
  26. 26. 26Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys Research Question 1: how do the features gathered from the LOD cloud impact on the quality of the representation?
  27. 27. 27Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys Research Question 2: are all of the features equally important? Is it possible to automatically select the most promising ones?
  28. 28. 28Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys X X Research Question 2: are all of the features equally important? Is it possible to automatically select the most promising ones?
  29. 29. 29Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 How to get the recommendations?
  30. 30. 30Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys Recommendations are obtained by identifying the most relevant (item) nodes for a target user, according to the graph topology. ?
  31. 31. 31Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys Recommendations are obtained by identifying the most relevant (item) nodes for a target user, according to the graph topology. ? How can we obtain such information?
  32. 32. 32Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys Variant of the original PageRank ? A possible solution: Personalized PageRank
  33. 33. 33Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys ? A possible solution: Personalized PageRank Rationale: Relevant nodes (items) can be identified through Random Walks. But they have to be influenced by previous users behaviors (preferences!).
  34. 34. 34 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 A large probability (e.g. 80%) is assigned a priori to specific items (the items the user liked) Weights are distributed according to a simple heuristic
  35. 35. 35 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 A large probability (e.g. 80%) is assigned a priori to specific items (the items the user liked) Weights are distributed according to a simple heuristic
  36. 36. 36 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 The rest is even distributed among the remaining nodes Weights are distributed according to a simple heuristic
  37. 37. 37 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation pipeline How to get the Recommendations?
  38. 38. 38 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Recommendation pipeline - Calculate Personalized PageRank score for each item node. - Sort PageRank scores in a descending order. - Select top-k recommendations
  39. 39. 39 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 - Calculate Personalized PageRank score for each item node. - Sort PageRank scores in a descending order. - Select top-k recommendations Recommendation pipeline
  40. 40. 40 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Workflow - Calculate Personalized PageRank score for each item node. - Sort PageRank scores in a descending order. - Select top-k recommendations
  41. 41. 41Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Graph-based RecSys It is likely that The Matrix is suggested to u1, since it is more (and better) connected in the graph
  42. 42. 42 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 One step back: is this the best weighting scheme?
  43. 43. 43 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Is it correct to give a tiny probability to the properties connected to the items we liked?
  44. 44. 44 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Contribution 2: PageRank Weighting Schemas tuning recommendation algorithm, by giving more importance to the properties gathered from the LOD
  45. 45. 45 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 We can distribute the a priori probabilities by following different heuristics
  46. 46. 46 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 80% to the items we liked, 20% to the other nodes (baseline)
  47. 47. 47 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 60% to the items we liked, 20% to the properties gathered from the LOD cloud, 20% to the other nodes ++ ++ -- --
  48. 48. 48 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 40% to the items we liked, 40% to the properties gathered from the LOD cloud, 20% to the other nodes +++ +++ --- ---
  49. 49. 49 Graph-based RecSys Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Research Question 3: how does the recommendation algorithm perform on varying of the weighting schemes?
  50. 50. 50 Experiments Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  51. 51. 51 Research Questions Do graph-based recommender systems benefit of the introduction of LOD-based features? How does our methodology perform when features selection is used to automatically select the most promising features? 1/2 1. 2. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  52. 52. 52 Research Questions 3. 4. 2/2 How does the recommendation algorithm perform on varying of the weighting schemes? How does our methodology perform with respect to state-of- the-art algorithms? Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  53. 53. 53 Experimental Evaluation Description of the dataset MovieLens 1M 6,040 users 3,883 movies 1,000,209 ratings 57.51% positive ratings 165.59 ratings/user (avg.) 269.88 ratings/item (avg.) 96.4% sparsity Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  54. 54. 54 Experimental Evaluation Description of the dataset DBbook 6,181 users 6,733 movies 72,732 ratings 45.86% positive ratings 11.71 ratings/user (avg.) 10.74 ratings/item (avg.) 99.85% sparsity Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  55. 55. 55 Experimental Evaluation DBpedia mapping 3,300 movies (85%) and 6,600 books (98%) were mapped to DBpedia by querying a SPARQL endpoint with the title of the item. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  56. 56. 56 Experimental Evaluation DBpedia mapping 60 LOD properties were extracted for MovieLens 70 LOD properties were extracted for DBbook Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  57. 57. 57 Experimental Evaluation Graph Representations :: Recap G Basic Graph with collaborative data points Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  58. 58. 58 Experimental Evaluation GLOD Graph extended with all the properties gathered from the LOD cloud Graph Representations :: Recap Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  59. 59. 59 Experimental Evaluation GLOD+FS Graph encoding only the most relevant properties selected by a feature selection technique FS Graph Representations :: Recap Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  60. 60. 60 Experimental Evaluation GLOD+FS Selection of the top-10 properties through features selection (Information Gain and PCA) Graph Representations :: Recap Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 C.Musto, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Introducing linked open data in graph-based recommender systems. Inf. Process. Manage. 53(2): 405-435 (2017)
  61. 61. 61Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015 Experimental Evaluation Graph Topologies - Comparison G G_LOD G_LOD+IG G_LOD+PCA MovieLens Nodes 9,625 30,204 18,146 13,288 Edges 460,124 509,481 480,526 465,272 Most of the edges are due to the collaborative part of the data model. Small number of properties added through G_LOD
  62. 62. 62Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015 Experimental Evaluation Graph Topologies - Comparison G G_LOD G_LOD+IG G_LOD+PCA Dbbook Nodes 12,649 211,611 88,669 28,164 Edges 33,189 534,841 142,334 67,411 Huge number of nodes and edges injected in G_LOD. Features selection strongly filters them.
  63. 63. 63 Experimental Evaluation Weighting Schemes 80/20 Original Weighting Scheme Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 60/20/20 20% for LOD properties 40/40/20 40% for LOD properties 20/60/20 60% for LOD properties
  64. 64. 64Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015 Experimental Evaluation Experimental Protocol Algorithm Personalized PageRank Data Split 5-fold Cross Validation for MovieLens, Train/Test for DBbook Graph Topologies G, GLOD, GLOD+PCA, GLOD+IG Weighting Schemes 80/20 - 60/20/20 - 40/40/20 - 20/60/20 Evaluation Metrics F1@5
  65. 65. Experiment 1 65 Impact of LOD-based features and FS techniques. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  66. 66. F1@5 G G_LOD G_LOD+PCA G_LOD+IG 53,5 53,75 54 54,25 54,5 54,04 53,98 54,06 53,96 Experiment 1 66 Impact of LOD-based features :: F1-measure Improvement only on both datasets MovieLens DBbook Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 F1@5 G G_LOD G_LOD+PCA G_LOD+IG 54 54,5 55 55,5 56 55,28 55,08 54,94 55,07
  67. 67. F1@5 G G_LOD G_LOD+PCA G_LOD+IG 53,5 53,75 54 54,25 54,5 54,04 53,98 54,06 53,96 Experiment 1 67 Impact of LOD-based features :: F1-measure MovieLens: improvement due to the LOD MovieLens Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 F1@5 G G_LOD G_LOD+PCA G_LOD+IG 54 54,5 55 55,5 56 55,28 55,08 54,94 55,07 DBbook
  68. 68. F1@5 G G_LOD G_LOD+PCA G_LOD+IG 54 54,5 55 55,5 56 55,28 55,08 54,94 55,07 F1@5 G G_LOD G_LOD+PCA G_LOD+IG 53,5 53,75 54 54,25 54,5 54,04 53,98 54,06 53,96 Experiment 1 68 Impact of LOD-based features :: F1-measure Expected behavior: representation unbalanced towards collaborative data points MovieLens DBbook Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  69. 69. F1@5 G G_LOD G_LOD+PCA G_LOD+IG 53,5 53,75 54 54,25 54,5 54,04 53,98 54,06 53,96 Experiment 1 69 Impact of LOD-based features :: F1-measure DBbook: LOD + FS lead to the best results MovieLens Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 F1@5 G G_LOD G_LOD+PCA G_LOD+IG 54 54,5 55 55,5 56 55,28 55,08 54,94 55,07 DBbook
  70. 70. F1@5 G G_LOD G_LOD+PCA G_LOD+IG 53,5 53,75 54 54,25 54,5 54,04 53,98 54,06 53,96 Experiment 1 70 Impact of LOD-based features :: F1-measure Reason: noisy properties gathered from the LOD cloud. FS helps. MovieLens Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 F1@5 G G_LOD G_LOD+PCA G_LOD+IG 54 54,5 55 55,5 56 55,28 55,08 54,94 55,07 DBbook
  71. 71. Take-Home Message 71 Linked Open Data and Features Selection techniques have a good impact on the effectiveness of the recommendations Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  72. 72. Experiment 2 72 Impact of different weighting schemes Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  73. 73. Experiment 2 73 Impact of Feature Selection :: MovieLens :: F1@5 G-LOD G-LOD-PCA G-LOD-IG Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 53,5 53,75 54 54,25 54 53,93 53,8 54,09 54,01 53,8 54,07 54,03 53,99 54,04 53,98 54,06 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  74. 74. Experiment 2 74 Impact of Feature Selection :: MovieLens :: F1@5 G-LOD G-LOD-PCA G-LOD-IG Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 53,5 53,75 54 54,25 54 53,93 53,8 54,09 54,01 53,8 54,07 54,03 53,99 54,04 53,98 54,06 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Improvement when PCA and IG are exploited
  75. 75. Experiment 2 75 Impact of Feature Selection :: MovieLens :: F1@5 G-LOD G-LOD-PCA G-LOD-IG Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 53,5 53,75 54 54,25 54 53,93 53,8 54,09 54,01 53,8 54,07 54,03 53,99 54,04 53,98 54,06 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Interesting outcome: even if we have many collaborative data points, different weighting schemes positively influence the recommendations
  76. 76. Experiment 2 76 Impact of Feature Selection :: DBbook :: F1@5 G-LOD G-LOD-PCA G-LOD-IG Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 54,5 54,8 55,1 55,4 55,24 55,04 54,96 55,26 54,99 55,04 55,33 55,03 54,98 55,28 55,07 54,94 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  77. 77. Experiment 2 77 Impact of Feature Selection :: DBbook :: F1@5 G-LOD G-LOD-PCA G-LOD-IG Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 Baseline 60-20-20 40-40-20 20-60-20 54,5 54,8 55,1 55,4 55,24 55,04 54,96 55,26 54,99 55,04 55,33 55,03 54,98 55,28 55,07 54,94 Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Improvement: IG is the best-performing configuration
  78. 78. Take-Home Message 78 Different Weighting Schemes further improve (even if with tiny gaps) the effectiveness of the recommendations Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  79. 79. Experiment 3 79 Comparison to State of the art U2U-KNN (User to User CF) I2I-KNN (Item to Item CF) POPULAR (Popularity-based baseline) BPRMF (Bayesian Personalized Ranking) [+] BPRMF+Side information [+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  80. 80. Experiment 3 80 Comparison to State of the Art :: MovieLens 50 51,25 52,5 53,75 55 F1@5 54,09 51,4 52,18 51,79 52,2 50,43 I2I-KNN U2U-KNN BPRMF BPRMF+Side POPULAR PR (G_LOD+IG+40/40) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 PageRank with Priors boosted with LOD is the best-performing approach
  81. 81. Experiment 3 81 Comparison to State of the Art :: MovieLens 50 51,25 52,5 53,75 55 F1@5 54,09 51,4 52,18 51,79 52,2 50,43 I2I-KNN U2U-KNN BPRMF BPRMF+Side POPULAR PR (G_LOD+IG+40/40) Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017 Even approaches based on Matrix Factorization are overcame by our methodology
  82. 82. Experiment 3 82 50 51,5 53 54,5 56 F1@5 55,33 52,9653,0452,9 51,93 51,11 I2I-KNN U2U-KNN BPRMF BPRMF+Side POPULAR PR (G_LOD+IG+60/20) Behavior confirmed on DBbook Comparison to State of the Art :: DBbook Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  83. 83. Conclusions 83Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  84. 84. Recap 84 Methodolology 1. PageRank with Priors as base algorithm 2. Mapping of the items with nodes in the Linked Open Data Cloud 3. Expansion of the data points and injection of new nodes and edges 4. Use of feature selection to automatically select the most promising properties 5. Introduction of different weighting schemes, to emphasize properties gathered from the LOD cloud INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN GRAPH-BASED RECOMMENDER SYSTEMS Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  85. 85. Lessons Learned 85 Evaluation 1. Personalized PageRank benefit of the injection of data points coming from the LOD cloud 2. Feature Selection techniques improve the results. 3. Properties gathered from the LOD are worth to be emphasized, since they improve the effectiveness of the algorithm 4. PageRank with Priors boosted with LOD significantly overcomes state-of-the-art approaches. INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA IN GRAPH-BASED RECOMMENDER SYSTEMS Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops. Tuning Personalized PageRank for Semantics-aware Recommendations based on Linked Open Data. ESWC 2017, Portoroz, 01.06.2017
  86. 86. questions? Cataldo Musto, PhD cataldo.musto@uniba.it @cataldomusto http://www.di.uniba.it/~swap

×