1. Top-N Recommender Systems: Revisiting Item Neighborhood Methods
George Karypis
Department of Computer Science & Engineering
University of Minnesota
karypis@cs.umn.edu
http://www.cs.umn.edu/~karypis
Abstract
Top-N recommender systems are designed to generate a ranked list of items that a user will find
useful based on the user’s prior activity. These systems have become ubiquitous and are an
essential tool for information filtering and (e-)commerce. Over the years, collaborative filtering,
which derive these recommendations by leveraging past activities of groups of users, has
emerged as the most prominent approach for solving this problem. Among the multitude of
methods that have been developed, item-based nearest neighbor algorithms are among the
simplest and yet best-performing methods for Top-N recommender systems. These methods
rank the items to be recommended based on how similar they are to the items in a user’s prior
activity history, using various co-occurrence similarity measures.
In this talk we present our recent work in these item-based neighborhood methods that has
substantially improved the accuracy of the predictions. One shortcoming of traditional item-
based neighborhood methods is that they rely on a similarity measure that needs to be specified
a priori. To address this problem we developed a class of item-based neighborhood methods
that directly estimate from the training data a sparse item-item similarity matrix. This similarity
matrix is estimated using a structural equation modeling (SEM) framework, which requires each
column of the user-item matrix to be approximated as a sparse aggregation of some other
columns. These other columns correspond to the learned neighbors and their aggregation
weights to the learned similarities. A second shortcoming of item-based neighborhood methods
is that the item-item similarity measures rely on co-occurrences, which become problematic
when the datasets are very sparse and the number of items pairs with sufficiently many co-
occurrences is small. To address this problem we extended the SEM framework to estimate a
factored version of the item-item similarity matrix. This factored representation projects the
items in a lower dimensional space, which allows for meaningful similarity estimates between
items that never co-occurred in the original user-item matrix. In addition to the above, we also
discuss and present result from our work to enhance the above SEM-models by incorporating
item side information to further improve the Top-N recommendation accuracy and to also
address the item cold-start recommendation problem.
Bio
George Karypis is a professor at the Department of Computer Science & Engineering at the
University of Minnesota, Twin Cities. His research interests spans the areas of data mining,
bioinformatics, cheminformatics, high performance computing, information retrieval,
collaborative filtering, and scientific computing. His research has resulted in the development of
software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph
partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-
based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO),
finding frequent patterns in diverse datasets (PAFI), and for protein secondary structure
prediction (YASSPP). He has coauthored over 200 papers on these topics and a book title
“Introduction to Parallel Computing” (Publ. Addison Wesley, 2003, 2nd edition). In addition, he is
2. serving on the program committees of many conferences and workshops on these topics, and
on the editorial boards of the IEEE Transactions on Knowledge and Data Engineering, Social
Network Analysis and Data Mining Journal, International Journal of Data Mining and
Bioinformatics, the journal on Current Proteomics, Advances in Bioinformatics, and Biomedicine
and Biotechnology.