This document presents a collaborative recommender system based on k-separability. The system addresses challenges in collaborative filtering like sparsity and noise in user rating data. It uses a dynamic neural network architecture that estimates the optimal number of separable clusters (k) in the data during training. The network is constructed iteratively using a constructive algorithm to add neurons and adapt weights. An experiment evaluated the system on a sparse, noisy dataset and found it produced meaningful recommendations.
1. An Efficient Collaborative Recommender System
based on k -separability
Georgios Alexandridis Georgios Siolas Andreas Stafylopatis
Department of Electrical and Computer Engineering
National Technical University of Athens
20th International Conference on Artificial Neural Networks
(ICANN 2010)
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 1 / 16
2. Outline
1 Current Trends in Recommender Systems
Recommender Systems
Design Issues
2 Theoretical & Practical Aspects of our Contribution
k-Separability
System Architecture
3 Evaluating our System
Experiment
Results
Conclusions
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 2 / 16
3. What are the Recommender Systems?
Recommender Systems attempt to present information items (e.g.
movies, music, books, news stories) that are likely to be of interest
to the user.
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 3 / 16
4. What are the Recommender Systems?
Recommender Systems attempt to present information items (e.g.
movies, music, books, news stories) that are likely to be of interest
to the user.
Some implementations
Amazon
"Customers Who Bought This Item Also Bought"
Google News
"Recommended Stories"
Online Audio Broadcasters
last.fm
Pandora
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 3 / 16
5. Taxonomy of Recommender Systems
Criterion: How are the predictions made?
Content-Based Recommenders
Locate "similar" items
Collaborative Recommenders
Find "like-minded" users
Hybrid Recommenders
Combination of the two
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 4 / 16
6. Taxonomy of Recommender Systems
Criterion: How are the predictions made?
Content-Based Recommenders
Locate "similar" items
Collaborative Recommenders
Find "like-minded" users
Hybrid Recommenders
Combination of the two
Which method is the best?
Open academic subject
Highly dependent on the application domain
We followed the Collaborative Recommender approach
Computationally simpler than the Hybrid approach
A user rating is more than a mere number; it is an aggregation of
various characteristics
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 4 / 16
7. Collaborative Recommender Systems
Key Component: The User Ratings’ Matrix
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
8. Collaborative Recommender Systems
Key Component: The User Ratings’ Matrix
Ratings
Indicate how much a user likes an item
"like" "dislike"
1-star up to 5-stars
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
9. Collaborative Recommender Systems
Key Component: The User Ratings’ Matrix
Ratings
Indicate how much a user likes an item
"like" "dislike"
1-star up to 5-stars
I1 I2 I3 I4
U1 5 3 2
U2 3 5 2
U3 1 2
U4 2 3
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
10. Collaborative Recommender Systems
Key Component: The User Ratings’ Matrix
Ratings
Indicate how much a user likes an item
"like" "dislike"
1-star up to 5-stars
I1 I2 I3 I4
U1 5 3 2
U2 3 5 2
U3 1 2
U4 2 3
Users become each other’s predictor
By locating positive and negative correlations among them.
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 5 / 16
11. Challanges in Collaborative Recommender System
Design
1 The cold-start problem
2 The sparsity problem
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
12. Challanges in Collaborative Recommender System
Design
1 The cold-start problem
Recommendations cannot be made unless a user has provided
some ratings
Solutions:
Recommend the most popular items
Explicity ask the user to rate some items prior to making
recommendations
2 The sparsity problem
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
13. Challanges in Collaborative Recommender System
Design
1 The cold-start problem
Recommendations cannot be made unless a user has provided
some ratings
Solutions:
Recommend the most popular items
Explicity ask the user to rate some items prior to making
recommendations
2 The sparsity problem
The ratings matrix is sparse
Empty elements: More than 90%
Solution: Dimensionality Reduction techniques
Singular Value Decomposition (SVD) yields good results
Pros: The resultant matrix is substantially smaller & densier
Cons: The dataset becomes very "noisy"
Most elements assume values that are marginally larger than zero
Conclusion: We are in need of techniques that can "learn" noisy
datasets!
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 6 / 16
14. "Noisy" Datasets
The added noise in the dataset hinders the discovery of patterns
in data
Data clusters become difficult to separate
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
15. "Noisy" Datasets
The added noise in the dataset hinders the discovery of patterns
in data
Data clusters become difficult to separate
Machine Learning techniques for highly non-separable datasets
Support Vector Machines, Radial Basis Functions
Evolutionary Algorithms
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
16. "Noisy" Datasets
The added noise in the dataset hinders the discovery of patterns
in data
Data clusters become difficult to separate
Machine Learning techniques for highly non-separable datasets
Support Vector Machines, Radial Basis Functions
Computing the support vector (or estimating the surface . . . ) can be a
computationally intensive task
Evolutionary Algorithms
Meaningful Recommendations are not always guaranteed
(evolutionary dead-ends)
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
17. "Noisy" Datasets
The added noise in the dataset hinders the discovery of patterns
in data
Data clusters become difficult to separate
Machine Learning techniques for highly non-separable datasets
Support Vector Machines, Radial Basis Functions
Computing the support vector (or estimating the surface . . . ) can be a
computationally intensive task
Evolutionary Algorithms
Meaningful Recommendations are not always guaranteed
(evolutionary dead-ends)
Our approach: Use k -separability!
Originally proposed by W. Duch1
Special case of the more general method of Projection Pursuit
Application to Feed-Forward ANNs
Extends linear separability of data clusters into k > 2 segments on
the discriminating hyperplane
1
W. Duch, K-separability. Lecture Notes in Computer Science 4131 (2006) 188-197
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 7 / 16
18. Extending linear separability to 3-separability
The 2-bit XOR problem
A highly non-separable dataset
It can be learned by a 2-layered perceptron, or ...
...by a single layer percpetron that implements k -separability!
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
19. Extending linear separability to 3-separability
The 2-bit XOR problem
A highly non-separable dataset
It can be learned by a 2-layered perceptron, or ...
...by a single layer percpetron that implements k -separability!
The activation function must partition the input space into 3
distinct areas
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2
(a) Input Space Partitioning
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
20. Extending linear separability to 3-separability
The 2-bit XOR problem
A highly non-separable dataset
It can be learned by a 2-layered perceptron, or ...
...by a single layer percpetron that implements k -separability!
The activation function must partition the input space into 3
distinct areas
Soft-Windowed Activation Functions
1.2
1
1
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
−0.2 0
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 −2 −1 0 1 2 3 4
(a) Input Space Partitioning (b) Soft-Windowed Activation
Function
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 8 / 16
21. Generalizing to k -separability
Complex Datasets
Combine the output of two neurons (or more . . . )
e.g. A 5-separable dataset can be learned by the combined output
of 2 neurons
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
22. Generalizing to k -separability
Complex Datasets
Combine the output of two neurons (or more . . . )
e.g. A 5-separable dataset can be learned by the combined output
of 2 neurons
Generalization by Induction
m-neuron output ⇒ 2m + 1 regions on the discriminating line
⇒ k = 2m + 1-separable dataset
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
23. Generalizing to k -separability
Complex Datasets
Combine the output of two neurons (or more . . . )
e.g. A 5-separable dataset can be learned by the combined output
of 2 neurons
Generalization by Induction
m-neuron output ⇒ 2m + 1 regions on the discriminating line
⇒ k = 2m + 1-separable dataset
Use in a Recommendation Engine
Create a 2-layered perceptron
n-sized input vector, m-sized hidden layer, single output layer
Overall, an n → m → 1 projection
Build a model (NN) for each user
Input: The ratings of the n "neighbors" of the target user on an item
he hasn’t evaluated
Output: A "score" for the unseen item
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 9 / 16
24. Implementation Details
The index of separability (k ) is not known a-priori
Setting k to a fixed value is of little help
It can lead to either overspecialization or to large training errors
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
25. Implementation Details
The index of separability (k ) is not known a-priori
Setting k to a fixed value is of little help
It can lead to either overspecialization or to large training errors
Therefore, k is a problem parameter: it has to be estimated
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
26. Implementation Details
The index of separability (k ) is not known a-priori
Setting k to a fixed value is of little help
It can lead to either overspecialization or to large training errors
Therefore, k is a problem parameter: it has to be estimated
Dynamic Network Architecture
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
27. Implementation Details
The index of separability (k ) is not known a-priori
Setting k to a fixed value is of little help
It can lead to either overspecialization or to large training errors
Therefore, k is a problem parameter: it has to be estimated
Dynamic Network Architecture
Sparse user ratings’ matrix ⇒ small overall network size ⇒
Constructive Network Algorithm
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
28. Implementation Details
The index of separability (k ) is not known a-priori
Setting k to a fixed value is of little help
It can lead to either overspecialization or to large training errors
Therefore, k is a problem parameter: it has to be estimated
Dynamic Network Architecture
Sparse user ratings’ matrix ⇒ small overall network size ⇒
Constructive Network Algorithm
Our constructive network algorithm was derived from the New
Constructive Algorithm2
2
Islam MM et al. A new constructive algorithm for architectural and functional adaptation of artificial neural
networks.
IEEE Trans Syst Man Cybern B Cybern. 2009 Dec;39(6):1590-605
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 10 / 16
29. Constructive Network Algorithm
1 Create a minimal architecture
2 Train the network in two phases on the whole Training Set
3 Iteratively add neurons in the hidden layer
Create new Training Sets based on the Classification Error
(Boosting Algorithm)
Only the newly added neuron’s weights are adapted; all other
remain "frozen"
4 Stop network construction when the Classification Error stabilizes
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 11 / 16
30. Constructive Network Algorithm
1 Create a minimal architecture
2 Train the network in two phases on the whole Training Set
3 Iteratively add neurons in the hidden layer
Create new Training Sets based on the Classification Error
(Boosting Algorithm)
Only the newly added neuron’s weights are adapted; all other
remain "frozen"
4 Stop network construction when the Classification Error stabilizes
Boosting Algorithm
Inspired from AdaBoost and used in Network Training as a way of
avoiding local minima
Functionality
Unlearned samples ⇒ New neurons in the hidden layer ⇒ New
clusters discovered
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 11 / 16
31. Our Collaborative Recommender System
Input: The user ratings’ matrix and the target user
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
32. Our Collaborative Recommender System
Input: The user ratings’ matrix and the target user
Output: A model (NN) for the target user
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
33. Our Collaborative Recommender System
Input: The user ratings’ matrix and the target user
Output: A model (NN) for the target user
Steps
1 Pick from the user ratings’ matrix all the co-raters of the target user
2 Compute the SVD of the co-raters matrix, retaining only the
non-zero Singular Values
3 Partition the resultant matrix in 3 different sets; the Training Set, the
Validation Set and the Test Set
4 Train a Constructive ANN Architecture (as discussed previously...)
5 Compute the Performance Metrics on the Test Set
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 12 / 16
34. Experiment
The MovieLens Database
Contains the ratings of 943 users on
1682 movies
Sparse matrix (6.3% of non-zero
elements)
140
Each user has rated at least 20 120
movies (106 on average), but. . . 100
Discrete Exponential Distribution 80
60% of all users have rated 100 60
movies or less 40
40% of all users have rated 50 20
movies or less 0
0 100 200 300 400 500 600 700 800
We followed a purely Collaborative (a) Rated items per user
Strategy
Taking into account only the user
ratings’ and not any other
demographic information
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 13 / 16
35. Experiment
Test Sets & Metrics
Many users rate only a few movies. How would our system
perform?
How would our system perform on the average case?
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
36. Experiment
Test Sets & Metrics
Many users rate only a few movies. How would our system
perform?
Group A: The few raters user group.
Contains all users who have rated 20-50 movies
How would our system perform on the average case?
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
37. Experiment
Test Sets & Metrics
Many users rate only a few movies. How would our system
perform?
Group A: The few raters user group.
Contains all users who have rated 20-50 movies
How would our system perform on the average case?
Group B: The moderate raters user group.
Contains all users who have rated 51-100 movies
May be used in comparisons to other implementations
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
38. Experiment
Test Sets & Metrics
Many users rate only a few movies. How would our system
perform?
Group A: The few raters user group.
Contains all users who have rated 20-50 movies
How would our system perform on the average case?
Group B: The moderate raters user group.
Contains all users who have rated 51-100 movies
May be used in comparisons to other implementations
We randomly picked 20 users from each group (40 users in total).
The results were averaged for each group
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
39. Experiment
Test Sets & Metrics
Many users rate only a few movies. How would our system
perform?
Group A: The few raters user group.
Contains all users who have rated 20-50 movies
How would our system perform on the average case?
Group B: The moderate raters user group.
Contains all users who have rated 51-100 movies
May be used in comparisons to other implementations
We randomly picked 20 users from each group (40 users in total).
The results were averaged for each group
Metrics
1 Precision
2 Recall
3 F-measure
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 14 / 16
40. Results
Table: Performance Results
Methodology Precision Recall F-measure
OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37%
OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97%
MovieMagician Clique-based 74% 73% 74%
Movielens 66% 74% 70%
SVD/ANN 67.9% 69.7% 68.8%
MovieMagician Feature-based 61% 75% 67%
MovieMagician Hybrid 73% 56% 63%
Correlation 64.4% 46.8% 54.2%
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 15 / 16
41. Results
Table: Performance Results
Methodology Precision Recall F-measure
OurSystem: User Group B (moderate ratings) 75.38% 82.21% 79.37%
OurSystem: User Group A (few ratings) 74.07% 88.86% 78.97%
MovieMagician Clique-based 74% 73% 74%
Movielens 66% 74% 70%
SVD/ANN 67.9% 69.7% 68.8%
MovieMagician Feature-based 61% 75% 67%
MovieMagician Hybrid 73% 56% 63%
Correlation 64.4% 46.8% 54.2%
Observations
Our system achieves good results in both usergroups and
outperforms the other approaches
Recall is higher in the few raters group because they seem to rate
only the movies they like
Therefore, the recommender cannot generalize
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 15 / 16
42. Conclusions
We have presented a complete Collaborative Recommender
System that is specifically fit for those cases where information is
limited
Our system achieves a good trade-off between Precision and
Recall, a basic requirement for Recommenders
This is due to the fact that k -separability is able to uncover
complex statistical dependencies (positive and negative)
We don’t need to filter the neighborhood of the target user as other
systems do (e.g. by using the Pearson Correlation Formula).
All "neighbors" are considered
Extremely useful in cases of sparse datasets
Alexandridis, Siolas, Stafylopatis (NTUA) k -separability Collaborative Recommender ICANN’10 16 / 16