ICT role in 21st century education and it's challenges.
Random Indexing for Content-based Recommender Systems
1. IIR 2011 - Italian Information Retrieval Workshop
Milano, Italy
Random Indexing for
Content-based
Recommender Systems
Cataldo Musto - cataldomusto@di.uniba.it
Pasquale Lops, Marco de Gemmis, Giovanni Semeraro
University of Bari “Aldo Moro” (Italy), SWAP Research Group
28.01.11
2. outline 2/18
• Introduction
• Analysis of Vector Space Models
• Content-based Recommender Systems
• Random Indexing for Content-based Recommender Systems
• Introducing Random Indexing
• Recommendation models
• Experimental Evaluation
• Open Issues
• Future Works
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
3. vector space model 3/18
• Weak Points
• High
Dimensionality
• Not incremental
• Does not manage
the latent
semantics of
documents
• Does not manage
negative
preferences
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
4. recommender systems 4/18
• A specific type of Information Filtering system that
attempts to recommend information
items (films, television, video on demand, music, books,
etc) that are likely to be of interest to the user
• Content-based Recommender Systems
• The degree of interest is inferred by comparing the
textual features extracted from the item w.r.t. the
features stored in the user profile
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
5. goals 5/18
• To investigate the impact of VSM in the
area of content-based recommender
systems
• To introduce techniques able to overcome
VSM typical VSM issues
• Random Indexing
• Dimensionality reduction technique (Sahlgren, 2005)
• Negation Operator
• Based on Quantum Logic (Widdows, 2007)
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
6. random indexing 6/18
• Random Indexing (RI) is an incremental and
effective technique for dimensionality reduction
• Introduced by Sahlgren in 2005
• Based on the so-called “Distributional
Hypothesis”
• “Words that occur in the same context tend to
have similar meanings”
• “Meaning is its use” (Wittgenstein)
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
7. how it works? 7/18
• Random Indexing reduces
the m-dimensional term/doc
matrix to a new
k-dimensional matrix
• How?
• By multiplying the original matrix
with a random one, built in
an incremental way
• formally: An,m Rm,k = Bn,k
• k << m
• After projection, the distance
between points in the vector space
is preserved
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
8. building the matrix 8/18
• A context vector is assignedcan contain only
vector has a fixed dimension (k) and it
for each term. This
values in -1, 0,1. Values are distributed in a random way
but the number of non-zero elements is much smaller.
• The Vector Space representation of a term is obtained
by summing the context vectors of the terms it co-occurs
with.
• The Vector Space representation of a document
(item) is obtained by summing the context vectors of the
terms that occur in it
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
9. profile representation 9/18
• What about the user profiles?
• Assumption
• The information coming from documents (items)
that the user liked in the past could be a reliable
source of information for building user profiles
• The Vector Space representation of a user
profile is obtained by combining the context vectors
of all the documents that the user liked in the past.
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
10. RI-based approach 10/18
Documents Rating Threshold
VSM representation of RI-based profile for user u
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
11. wRI-based approach 11/18
Documents Rating Threshold
Higher weight given to the documents with higher rating
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
12. negation operator 12/18
• Both models inherit a classical problem of VSM
• User profiles modeled only according to positive
preferences
• In classical text classifiers (Naive Bayes, SVM, etc.) both positive and
negative preferences are modeled
• Introduction of a Negation Operator based on
Quantum Logic to tackle this problem
• Query as “A not B” are allowed!
• Projection of vector A on the subspace orthogonal to those generated by the vector B
(*) http://code.google.com/p/semanticvectors/
• Implemented in the Semantic Vectors* open-source package
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
13. SV-based approach 13/18
Positive User Profile Vector
Negative User Profile Vector
VSM representation of SV-based profile for user u
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
14. wSV-based approach 14/18
Positive User Profile Vector
Negative User Profile Vector
VSM representation of wSV-based profile for user u
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
15. recommendation step 15/18
• u and a set of items we can suppose that the most relevant
Given a user profile
items for u are the nearest ones in the vector space
• RI and wRI: Submission of a query based on
• SV and wSV: Submission of a query based on
• Returns the items with as much as possible features from p+ and as
less as possible features from p-
• Cosine Similarity to rank the items
• Items whose similarity is under a certain threshold are labeled as non-relevant
and filtered
• Recommendation of the items with the highest similarity w.r.t.
liked documents are combined.
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
16. experimental design 16/18
• Dataset
• Based on MovieLens, enriched with contents
crawled from Wikipedia
• 613 users, 520 items, 25k terms, 40k ratings
• Experiment 1
• Do the weighting schema improve the
predictive accuracy of the recommendation models?
• Experiment 2
• Do the introduction of a negation operator
improve the predictive accuracy of the recommendation
models?
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
17. results 17/18
RI W-RI SV W-SV Bayes
Av-Precision@1 85.93 86.33 85.97 86.78 86.39
Av-Precision@3 85.78 85.97 86.19 86.33 85.97
Av-Precision@5 85.75 86.10 85.99 86.16 85.83
Av-Precision@7 85.61 85.92 85.88 85.95 85.77
Av-Precision@10 85.45 85.76 85.76 85.83 85.75
• SV and RI improve the Average Precision with
respect to the Naive Bayes approach (currently
implemented in our recommender system)
17
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
18. conclusions 18/18
• Investigation of the impact of Random Indexing in the area of content-based
recommender systems
• Use of Random Indexing for dimensionality reduction
• Introduction of Negation Operator based on Quantum
Logic
• Encouraging experimental results
• First results improve the predictive accuracy
obtained by classical content-based filtering techniques (e.g. Bayes)
• Work-in-progress
• To compare results with classical TF/IDF-based VSM, LSA, Rocchio
and so on
C.Musto, P.Lops, M.de Gemmis, G.Semeraro: Random Indexing for Content-based Recommender Systems - IIR 2011 Workshop - Milano, Italy - 28.01.11
19. http://www.di.uniba.it/~swap/
discussion
Thanks for your attention
Cataldo Musto - cataldomusto@di.uniba.it
University of Bari (Italy), SWAP Research Group
IIR 2011 - Italian Information Retrieval Workshop