SlideShare uma empresa Scribd logo
1 de 19
IIR 2011 - Italian Information Retrieval Workshop
                           Milano, Italy




 Random Indexing for
    Content-based
Recommender Systems
          Cataldo Musto - cataldomusto@di.uniba.it
          Pasquale Lops, Marco de Gemmis, Giovanni Semeraro

 University of Bari “Aldo Moro” (Italy), SWAP Research Group
                          28.01.11
outline                                                                                                                  2/18

              •     Introduction
                   •   Analysis of Vector Space Models
                   •   Content-based Recommender Systems


              •     Random Indexing for Content-based Recommender Systems
                   •  Introducing Random Indexing
                   •  Recommendation models


              •     Experimental Evaluation
                   •   Open Issues
                   •   Future Works


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
vector space model                                                                                                        3/18


                                                                                    •     Weak Points
                                                                                         •     High
                                                                                               Dimensionality
                                                                                         •     Not incremental
                                                                                         •     Does not manage
                                                                                               the latent
                                                                                               semantics of
                                                                                               documents
                                                                                         •     Does not manage
                                                                                               negative
                                                                                               preferences


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
recommender systems                                                                                                       4/18


              •     A specific type of Information Filtering system that
                    attempts           to recommend              information
                    items (films, television, video on demand, music, books,  
                    etc) that are likely to be of interest to the user


              •     Content-based Recommender Systems
                   •      The degree of interest is inferred by comparing the
                          textual features extracted from the item w.r.t. the
                          features stored in the user profile


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
goals                                                                                                                     5/18


                   •      To investigate the impact of VSM in the
                          area of content-based recommender
                          systems
                   •      To introduce techniques able to overcome
                          VSM typical VSM issues
                        •      Random Indexing
                             •      Dimensionality reduction technique (Sahlgren, 2005)
                        •      Negation Operator
                             •      Based on Quantum Logic (Widdows, 2007)

C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
random indexing                                                                                                           6/18

                   •      Random Indexing (RI) is an incremental and
                          effective technique for dimensionality reduction
                        •      Introduced by Sahlgren in 2005


                   •      Based on the so-called “Distributional
                          Hypothesis”
                        •      “Words that occur in the same context tend to
                               have similar meanings”
                        •      “Meaning is its use” (Wittgenstein)

C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
how it works?                                                                                                          7/18

         •       Random Indexing reduces
                 the m-dimensional term/doc
                 matrix to a new
                 k-dimensional matrix

        •     How?
             •      By multiplying the original matrix
                    with a random one, built in
                    an incremental way
                  •      formally: An,m Rm,k = Bn,k
                  •      k << m
             •      After projection, the distance
                    between points in the vector space
                    is preserved
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
building the matrix                                                                                                     8/18


              • A context vector is assignedcan contain only
                vector has a fixed dimension (k) and it
                                                       for each term. This

                    values in -1, 0,1. Values are distributed in a random way
                    but the number of non-zero elements is much smaller.

              •     The Vector Space representation of a term is obtained
                    by summing the context vectors of the terms it co-occurs
                    with.

              •     The Vector Space representation of a document
                    (item) is obtained by summing the context vectors of the
                    terms that occur in it


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
profile representation                                                                                                   9/18


              •     What about the user profiles?
                   •      Assumption
                        •      The information coming from documents (items)
                               that the user liked in the past could be a reliable
                               source of information for building user profiles


                   •      The Vector Space representation of a                                   user
                          profile is obtained by combining the context vectors
                          of all the documents that the user liked in the past.


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
RI-based approach                                                                                                      10/18




                           Documents                                  Rating                       Threshold




                 VSM representation of RI-based profile for user u
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
wRI-based approach                                                                                                     11/18




                           Documents                                  Rating                       Threshold




     Higher weight given to the documents with higher rating

C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
negation operator                                                                                                      12/18

                   •     Both models inherit a classical problem of VSM
                        •       User profiles modeled only according to positive
                                preferences

                        •       In classical text classifiers (Naive Bayes, SVM, etc.) both positive and
                                negative preferences are modeled


                        •       Introduction of a Negation Operator based on
                                Quantum Logic to tackle this problem
                            •      Query as “A not B” are allowed!
                            •      Projection of vector A on the subspace orthogonal to those generated by the vector B

                                                                               (*) http://code.google.com/p/semanticvectors/

              •     Implemented in the Semantic                         Vectors* open-source package
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
SV-based approach                                                                                                      13/18

 Positive User Profile Vector




 Negative User Profile Vector




        VSM representation of SV-based profile for user u

C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
wSV-based approach                                                                                                     14/18

 Positive User Profile Vector




 Negative User Profile Vector




      VSM representation of wSV-based profile for user u

C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
recommendation step                                                                                                    15/18

              •                       u and a set of items we can suppose that the most relevant
                    Given a user profile
                    items for u are the nearest ones in the vector space
                   •      RI and wRI: Submission of a query based on
                   •      SV and wSV: Submission of a query based on

                        •      Returns the items with as           much as possible features from p+ and as
                               less as possible features from p-

              •     Cosine Similarity to rank the items
                   •      Items whose similarity is under a certain threshold are labeled as non-relevant
                          and filtered

              •     Recommendation of the items with the                     highest similarity w.r.t.
          liked documents are combined.



C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
experimental design                                                                                                       16/18

                        •      Dataset
                             •      Based on MovieLens, enriched with contents
                                    crawled from Wikipedia
                             •      613 users, 520 items, 25k terms, 40k ratings
                        •      Experiment 1

                             •      Do the weighting schema improve the
                                    predictive accuracy of the recommendation models?
                        •      Experiment 2

                             •      Do the introduction of a negation operator
                                    improve the predictive accuracy of the recommendation
                                    models?


C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
results                                                                                                                17/18
                                         RI           W-RI               SV              W-SV                   Bayes
          Av-Precision@1                85.93         86.33             85.97            86.78                   86.39
          Av-Precision@3                85.78         85.97             86.19            86.33                   85.97
          Av-Precision@5                85.75         86.10             85.99            86.16                   85.83
          Av-Precision@7                85.61         85.92             85.88            85.95                   85.77
         Av-Precision@10                85.45         85.76             85.76            85.83                   85.75

                •      SV and RI improve the Average Precision with
                       respect to the Naive Bayes approach (currently
                       implemented in our recommender system)


                                                                    17
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
conclusions                                                                                                               18/18


                   •      Investigation of the impact of Random Indexing in the area of content-based
                          recommender systems

                        •      Use of Random                  Indexing for dimensionality reduction
                        •      Introduction of Negation                    Operator based on Quantum
                               Logic

                        •      Encouraging experimental results
                             •      First results improve the predictive accuracy
                                    obtained by classical content-based filtering techniques (e.g. Bayes)
                   •      Work-in-progress

                        •      To compare results with classical TF/IDF-based VSM, LSA, Rocchio
                               and so on




C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
http://www.di.uniba.it/~swap/

                               discussion



Thanks for your attention

     Cataldo Musto - cataldomusto@di.uniba.it

  University of Bari (Italy), SWAP Research Group
     IIR 2011 - Italian Information Retrieval Workshop

Mais conteúdo relacionado

Semelhante a Random Indexing for Content-based Recommender Systems

On viable service systems
On viable service systemsOn viable service systems
On viable service systemsIESS
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics researchkieran122
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccpElena Simperl
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social MediaMeena Nagarajan
 
Applying Learner Centered Methodology - Case Studies
Applying Learner Centered Methodology - Case StudiesApplying Learner Centered Methodology - Case Studies
Applying Learner Centered Methodology - Case StudiesKern Learning Solution
 
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012yamanote
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Machine learning
Machine learningMachine learning
Machine learninghplap
 
Business Analysis meet Test Analysis
Business Analysis meet Test AnalysisBusiness Analysis meet Test Analysis
Business Analysis meet Test AnalysisJoe Newbert
 

Semelhante a Random Indexing for Content-based Recommender Systems (16)

On viable service systems
On viable service systemsOn viable service systems
On viable service systems
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Insemtives stanford
Insemtives stanfordInsemtives stanford
Insemtives stanford
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics research
 
Insemtives cluj iccp
Insemtives cluj iccpInsemtives cluj iccp
Insemtives cluj iccp
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
Applying Learner Centered Methodology - Case Studies
Applying Learner Centered Methodology - Case StudiesApplying Learner Centered Methodology - Case Studies
Applying Learner Centered Methodology - Case Studies
 
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
 
data analysis.ppt
data analysis.pptdata analysis.ppt
data analysis.ppt
 
data analysis.pptx
data analysis.pptxdata analysis.pptx
data analysis.pptx
 
qualitative analysis
qualitative analysisqualitative analysis
qualitative analysis
 
Machine learning
Machine learningMachine learning
Machine learning
 
Business Analysis meet Test Analysis
Business Analysis meet Test AnalysisBusiness Analysis meet Test Analysis
Business Analysis meet Test Analysis
 

Mais de Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 

Mais de Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Último

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 

Último (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

Random Indexing for Content-based Recommender Systems

  • 1. IIR 2011 - Italian Information Retrieval Workshop Milano, Italy Random Indexing for Content-based Recommender Systems Cataldo Musto - cataldomusto@di.uniba.it Pasquale Lops, Marco de Gemmis, Giovanni Semeraro University of Bari “Aldo Moro” (Italy), SWAP Research Group 28.01.11
  • 2. outline 2/18 • Introduction • Analysis of Vector Space Models • Content-based Recommender Systems • Random Indexing for Content-based Recommender Systems • Introducing Random Indexing • Recommendation models • Experimental Evaluation • Open Issues • Future Works C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 3. vector space model 3/18 • Weak Points • High Dimensionality • Not incremental • Does not manage the latent semantics of documents • Does not manage negative preferences C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 4. recommender systems 4/18 • A specific type of Information Filtering system that attempts to recommend information items (films, television, video on demand, music, books,   etc) that are likely to be of interest to the user • Content-based Recommender Systems • The degree of interest is inferred by comparing the textual features extracted from the item w.r.t. the features stored in the user profile C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 5. goals 5/18 • To investigate the impact of VSM in the area of content-based recommender systems • To introduce techniques able to overcome VSM typical VSM issues • Random Indexing • Dimensionality reduction technique (Sahlgren, 2005) • Negation Operator • Based on Quantum Logic (Widdows, 2007) C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 6. random indexing 6/18 • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Introduced by Sahlgren in 2005 • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein) C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 7. how it works? 7/18 • Random Indexing reduces the m-dimensional term/doc matrix to a new k-dimensional matrix • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preserved C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 8. building the matrix 8/18 • A context vector is assignedcan contain only vector has a fixed dimension (k) and it for each term. This values in -1, 0,1. Values are distributed in a random way but the number of non-zero elements is much smaller. • The Vector Space representation of a term is obtained by summing the context vectors of the terms it co-occurs with. • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in it C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 9. profile representation 9/18 • What about the user profiles? • Assumption • The information coming from documents (items) that the user liked in the past could be a reliable source of information for building user profiles • The Vector Space representation of a user profile is obtained by combining the context vectors of all the documents that the user liked in the past. C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 10. RI-based approach 10/18 Documents Rating Threshold VSM representation of RI-based profile for user u C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 11. wRI-based approach 11/18 Documents Rating Threshold Higher weight given to the documents with higher rating C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 12. negation operator 12/18 • Both models inherit a classical problem of VSM • User profiles modeled only according to positive preferences • In classical text classifiers (Naive Bayes, SVM, etc.) both positive and negative preferences are modeled • Introduction of a Negation Operator based on Quantum Logic to tackle this problem • Query as “A not B” are allowed! • Projection of vector A on the subspace orthogonal to those generated by the vector B (*) http://code.google.com/p/semanticvectors/ • Implemented in the Semantic Vectors* open-source package C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 13. SV-based approach 13/18 Positive User Profile Vector Negative User Profile Vector VSM representation of SV-based profile for user u C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 14. wSV-based approach 14/18 Positive User Profile Vector Negative User Profile Vector VSM representation of wSV-based profile for user u C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 15. recommendation step 15/18 • u and a set of items we can suppose that the most relevant Given a user profile items for u are the nearest ones in the vector space • RI and wRI: Submission of a query based on • SV and wSV: Submission of a query based on • Returns the items with as much as possible features from p+ and as less as possible features from p- • Cosine Similarity to rank the items • Items whose similarity is under a certain threshold are labeled as non-relevant and filtered • Recommendation of the items with the highest similarity w.r.t. liked documents are combined. C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 16. experimental design 16/18 • Dataset • Based on MovieLens, enriched with contents crawled from Wikipedia • 613 users, 520 items, 25k terms, 40k ratings • Experiment 1 • Do the weighting schema improve the predictive accuracy of the recommendation models? • Experiment 2 • Do the introduction of a negation operator improve the predictive accuracy of the recommendation models? C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 17. results 17/18 RI W-RI SV W-SV Bayes Av-Precision@1 85.93 86.33 85.97 86.78 86.39 Av-Precision@3 85.78 85.97 86.19 86.33 85.97 Av-Precision@5 85.75 86.10 85.99 86.16 85.83 Av-Precision@7 85.61 85.92 85.88 85.95 85.77 Av-Precision@10 85.45 85.76 85.76 85.83 85.75 • SV and RI improve the Average Precision with respect to the Naive Bayes approach (currently implemented in our recommender system) 17 C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 18. conclusions 18/18 • Investigation of the impact of Random Indexing in the area of content-based recommender systems • Use of Random Indexing for dimensionality reduction • Introduction of Negation Operator based on Quantum Logic • Encouraging experimental results • First results improve the predictive accuracy obtained by classical content-based filtering techniques (e.g. Bayes) • Work-in-progress • To compare results with classical TF/IDF-based VSM, LSA, Rocchio and so on C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
  • 19. http://www.di.uniba.it/~swap/ discussion Thanks for your attention Cataldo Musto - cataldomusto@di.uniba.it University of Bari (Italy), SWAP Research Group IIR 2011 - Italian Information Retrieval Workshop

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n