This document proposes a method for recommender systems that counts different configurations ("squares") in the user-item bipartite rating network to predict whether a user will rate an item highly. It involves counting the number of each configuration for every user-item pair to generate features, then training a machine learning classifier on these features. The method was applied to the KDD Cup 2011 Yahoo! Music Dataset competition and achieved competitive results, with enhancements like normalizing against random networks and separating counts based on item hierarchy. Interestingly, configurations involving "hate" edges were most predictive of a user's potential love for an item.
AWS Community Day CPH - Three problems of Terraform
Just Count the Love-Hate Squares
1. Just Count the Love-Hate Squares:
a Rating Network Based Method for
Recommender Systems
KDD Cup 2011
August 21, 2011
Joseph Kong, Kyle Teague, Justin Kessler
Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042
2. Link Prediction in Bipartite Rating Network
1 2 3 4 Items
80
20 100 90 50
?
A B Users
1 2 3 4 Items
+
- + + -
?
A B Users
• Solid edges represent the observed rating pattern
• Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” );
2 • Goal: predict whether unobserved link is highly rated?
3. Motivation: Happy Hour with Brock and Donald
Song 1
+ Brock +
Song 2 Donald
- +
? - ? +
- +
- +
Me - Me +
- +
• Happy hour chat: with Brock, there are 3 songs that we
both hate; with Donald, we find 3 songs we both love.
• Now, Brock loves Song 1 and Donald loves Song 2
• Am I more likely to love Song 1 or Song 2?
• Main idea: the presence of certain type of square may be
3
highly indicative of love/hate; so, just count them!
4. The Square Counting Method: How to Count
- + - +
? 0 - ? 1 - ? 2 + ? 3 +
- - - -
- + - +
? 4 - ? 5 - ? 6 + ? 7 +
+ + + +
Configuration No. denoted in middle
• Given user-item (utg-itg) pair: Count number of each
configuration and form feature vector
• For example, in right Fig., the path (utg-i1-u1-itg), which has a
sign sequence of {-,+,-}, corresponds to configuration No. 2
(see left Fig.); thus, the count for configuration No. 2 is 1.
4
5. The Square Counting Method: Machine Learning
• Counts for different square configurations form the features.
• Construct the validation set with user-item pairs with known ratings.
• Machine learning framework:
1. Perform square counting on rating network for each user-item pair in the
validation set and generate the validation instance-feature matrix.
2. Train a machine learned classifier on validation instance-feature matrix.
3. Repeat square counting on the rating network for the test set and generate the
test instance-feature matrix.
4. Apply the machine learned classifier for each instance in the test instance-
feature matrix.
5
6. KDD Cup Track 2-Yahoo! Music Dataset
• Goal is to develop algorithms to separate which ratings were
highly rated by a user (score >=80) and which were not.
• For each user in the test set, 6 songs were given; out of the 6
songs, 3 songs were highly rated by the user and 3 songs were
not (task is to distinguish them)
• Winners are determined by the error rate on a hold-out test set
Statistic Count
Users 249,012
Items 296,111
Ratings 62,551,438
Training Ratings 61,944,406
Test Ratings 607,032
7. Summary of Results-KDD Cup Track 2
• Enhancements • Square counting
– Normalizing square counts – Generate feature-instance matrix
against random network model – Implemented in C++/OpenMP
– Separate counts based on item – ~ 5 hr on 8-core workstation (2 GB
hierarchy RAM)
– Further edge categorization
• Machine learning: ~1 hr
– Removing very popular items
– Using bias-removed scores
7
8. Hate is a Powerful Signal in Predicting Love
• Logistic regression coefficients (in 10-3) for each love-hate
square configuration in predicting a user's highly rated items
• Interesting observation: most powerful configs for predicting
a user’s love for an item comes from hate edges: config. No.
1 & 4 (2nd top row; 1st bottom row).
• Config. No. 1 (2nd top row) means: Item X is recommended
to you because you hate items Y and Z!
8