1. Mathematical methods of
Tensor Factorization
applied to Recommender Systems
Giuseppe Ricci, PhD Student in Computer Science
University of Study of Bari “A. Moro”
Advances in DataBases and Information Systems
PhD Consortium, Genoa, 01 Septembre 2013
Semantic
Web
Access and
Personalization
research group
http://www.di.uniba.it/~swap
Dipartimento
di Informatica
2. Information Overload & Recommender Systems
On internet today, an overabundance of information can be
accessed, making it difficult for users to process and
evaluate options and make appropriate choices.
Recommender Systems (RS) are techniques for
information filtering which play an important role in e-
commerce, advertising, e-mail filtering, etc.
3. What do RS do exactly?
① Predict how much you may like a certain product/service
② Compose a list of N best items for you
③ Compose a list of N best users for a certain product/service
④ Explain why these items are recommended to you
⑤ Adjust the prediction and recommendation based on your
feedback (ratings) and other people
I1 I2 I3 I4 I5 I6 I7 I8 I9
U1 1 5 4
U2 4 2 5
U3 4 5
U4 5 2 4
A 1 3 1 3 1 4 5 8
user-item matrix
4. Matrix Factorization
Matrix Factorization (MF) techniques fall in the class of
collaborative filtering (CF) methods latent factor
models: similarity between users and items is induced
by some factors hidden in the data
Latent factor models build a matrix of users and items and
each element is associated with a vector of characteristics
MF techniques represent users and items by vectors of
features derived from ratings given by users for the items
seen or tried
Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for
recommender systems. IEEE Computer, 42(8):30-37, 2009.
5. Matrix Factorization
U set of users, D set of items, R rating matrix.
MF aims to factorize R into two matrices P and Q such that their
product approximates R:
P row: strength of the association between user and k latent
features.
Q column: strength of the association between an item and the
latent features.
Once these vectors are discovered, recommendations are
calculated using the expression of
A MF used in literature: Singular Value Decomposition (SVD):
• introduced by Simon Funk in the NetFlix Prize
• has the objective of reducing the dimensionality, i. e. the rank,
of the user-item matrix
• capture latent relationships between users and items
T T
ij i jR P Q r p q
ijr
6. SVD
Different SVD algorithms were used in RS literature:
• in [15], the authors uses a small SVD obtained retaining only
k << r singular values by discarding other entries;
• in [11], the authors propose an algorithm to perform SVD on large
matrices, by focusing the study on parameters that affect the
convergence speed;
• in [9], Koren presents an approach oriented on factor models
which projected users and items in the same latent space where
some measures for comparison are defined. He propose several
versions of SVD with the objective of having better
recommendations as well as good scalability
[15] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Incremental singular value
decomposition algorithms for highly scalable recommender systems.
[11] Miklos Kurucz, Andras A. Benczur, and Balazs Torma. Methods for large scale SVD with missing
values.
[9] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
7. Limitation of MF Techniques
They take into account only the standard profile of users and items This does
not allow to integrate further information such as context
Contextual information (the place where the user see the movie, the device, the
company...) cannot be managed with simple user-item matrices
Family with
children
At cinema with
friends or collegues
8. Tensors & Tensor Factorization
[6] R.A. Harshman. Foundations of the PARAFAC Procedure: Models and Conditions
for an "explanatory" Multi-modal Factor Analysis, volume 1 (16) of Working papers in
phonetics. University of California at Los Angeles, 1970.
[12] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular
value decomposition. SIAM J. Matrix Anal. Appl, 21:1253-1278, 2000.
Tensors are higher-dimensional arrays of numbers might be exploited in
order to include additional contextual information in the recommendation
process.
The techniques that generalize the MF can also be applied to tensors.
Two particular Tensor Factorizations (TF) can be considered to be higher-
order extensions of matrix singular value decomposition:
• PARallel FACtor analysis [6] or CANonical DECOMPosition
(PARAFAC/CANDECOMP), which decomposes a tensor as a sum of
rank-one tensors;
• High Order Singular Value Decomposition [12] (HOSVD), which is
an higher-order form of Principal Component Analysis (PCA)
9. HOSVD is the most widely adopted TF technique.
HOSVD is a generalization of the SVD for matrices: decomposes the
initial tensor in N matrices (N is the size of the tensor) and a “small
tensor”.
Examples of HOSVD in RS:
• Multiverse recommendation [7]: TF is applied to manage data for users,
movies, user ratings and contextual information such as age, day of the
week, companion;
• Tensor factorization for tag recommendation [13]: for a social tagging
system, users' data, items and tags are stored in a 3rd order tensor
factored, aim: discovering latent factors which bind the associations
user-item, user-tag and tag-item;
[7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver.
Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative
filtering.
[13] Steen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme.
Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727-736,
2009.
HOSVD & RS 1/2
10. HOSVD & RS 2/2
• Cubesvd [17]: system of personalized web search, in order to
discover the hidden relationships between users, queries, web pages.
Data are collected in a 3rd order tensor that is decomposed.
[17] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to
personalized web search. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages
382-390, New York, NY, USA, 2005. ACM.
11. HOSVD: advantages & disadvantages
Advantages:
• the ability of taking into account more dimensions
simultaneously
• better data modeling than standard SVD, dimensionality reduction
can be performed not only in one dimension but also separately for
each dimension
Disadvantages:
• is not an optimal tensor decomposition, in the
sense of least squares data fitting: in SVD truncating the first n
singular values allows to find the best n-rank approximation of a
given matrix
• high computational cost
• cannot deal with missing values they are treated as 0
12. PARAFAC
PARAFAC model of a 3-dimensional array is given by 3 loading
matrices A, B and C with typical elements aif , bjf , and ckf .
PARAFAC model is defined by:
ˆxijk = aif bjf ckf
f =1
F
å
F: number of rank-one components.
PARAFAC Advantages:
• alternative to HOSVD
• more simplicity
• linear computation time compared to HOSVD
• does not collapse data, but it retains its natural 3-dimensional
structure
• components are unique, up to permutation and scaling, under mild
conditions
13. PARAFAC, RS and not only 1/2
In Tfmap: optimizing map for top-n context-aware recommendation [16]:
tensor of 3-dimensions (users, items and context types) is factorized with
PARAFAC.
Dimensions are associated with the 3 factor matrices and used to calculate
user preference for item i under context type k.
Problem: PARAFAC & Missing Data
Solution: CP-WOPT algorithm
[16] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap:
optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR
conference on Research and development in information retrieval, SIGIR '12, pages 155{164, New York, NY, USA,
2012. ACM.
14. PARAFAC, RS and not only 2/2
In Scalable tensor factorizations with missing data, PARAFAC & Missing
Data. CP-WOPT [1] (CP Weighted OPTimization) algorithm uses 1st-order
optimization to solve the weighted least squares objective function.
Using extensive numerical experiments on simulated data sets CP-WOPT can
successfully factor tensors with noise and up to 70% missing data.
CP-WOPT is significantly faster and accurate than the best published method
in literature.
[1] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mrup. Scalable tensor factorizations with
missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining, pages 701-
712, Philadelphia, April 2010. SIAM.
15. CP-WOPT adaptation: Preliminary Experiments 1/3
CP-WOPT algorithm adapted to RS:
• takes into account missing values the algorithm is suitable for very
sparse user-item matrices
• computation of a weighted factorization that models only known
values, rather to simply employ 0 values for missing data
• main goals:
• good reconstruction of missing values
• consider contextual information to achieve more precise
recommendations.
Preliminary user study: users rated some movies (not all) under
contextual factors
7 real users
11 movies in the Movielens 100k dataset
contextual factors: if they like to see the movie
at home or cinema;
with friends or with partner;
with or without family.
16. CP-WOPT adaptation: Preliminary Experiments 2/3
Main Goal: good reconstruction of missing values with CP-WOPT
adapted
Ratings range: 1 to 5
Rating coding:
• 1-2: strong-modest preference for the 1st option
• 3: neutrality;
• 4-5: modest-strong preference for the 2nd option
Metrics:
accuracy (acc ), % of known values correctly reconstructed
coverage (cov ), % of non-zero values returned
Results: 105 maximum iterations
acc = 94.4%
cov = 91.7%
100
100
known values
errors
acc
100
cov 100
unknown values
errors
17. Other quality results:
the experiment showed that it is possible to express, through the n-
dimensional factorization, not only the recommendations for the single
user, but also more specific suggestions about the consumption of an
item.
CP-WOPT adaptation: Preliminary Experiments 3/3
18. In Vitro: Preliminary Experiment
Main Goal: test CP-WOPT adapted on RS for more precise
recommendations
Adapted version of CP-WOPT subset (significant number of ratings)
of Movielens 100k dataset.
Ratings given by users wich have a profession are stored in a 3rd order
tensor.
Input: tensor of dimensions 100 users, 150 movies, 21 occupations (the
contextual factor)
Results:
acc = 92.09%
cov = 99.96%
MAE = 0.60
RMSE = 0.93
in line with results reported in literature
19. Ongoing and Future Work
• Extend the evaluation of our version of CP-WOPT on tensor
having high dimensionality (Movielens dataset)
• investigate methods to assess whether and which contextual
factors (occupation, company) inuflence the users' preferences
• user’s segmentation
• plan to test our approach in other domains such as news
recommendation or Electronic Program Guides
20. Thanks for your attention!!
Dott. Giuseppe Ricci
PhD Student in Computer Science
Department of Computer Science
4 floor LACAM Lab., SWAP Room
Phone: +39-080-5442298
E-mail: giuseppe.ricci@uniba.it