[Vietnam Mobile Day 2014] Toàn cảnh thị trường game smartphone Việt Nam 2013....
Speaker pham cong dinh
1. A quick introduction to item-based
collaborative filtering
Pham Cong Dinh
@pcdinh
PHPDay Saigon 2012
2. Outline
● PHP popularity and challenges to produce
engaging content
● Recommendation engine at work
● How to build a item-based collaborative
filtering-based recommendation engine
10. Build a recommendation system
● Collaborative filtering: user and item
– Filtering: automatic predictions about the interests
of a user
– Collaborative: many users (preferences or taste
information)
11. Item-based collaborative filtering
● Model-based
– The similarities between different items in the data
set are calculated
– Predict ratings for user-item pairs not present in
the data set
12. Steps to do item-based
collaborative filtering
● Data collection and representations (preferences/taste
…)
● Finding the relationships and determine the similarity
● Recommendation computations -
recommendations/suggestions/discoveries (produce
engaging content)
13. Collaborative filtering: data
collection
● Data collection and representations
(preferences/taste …) (user, item)
– Clicks X,1
✗
– Likes, favorites X,2
✗
Y,1
Watch, read
✗
–
Y,2
✗
– Survey
Z,2
✗
– Ratings Z,3
✗
– Others …
● E.x: Find the set of movies that user X likes
14. Collaborative filtering: Similarity
(1)
● Finding the relationships and determine the
similarity
– The similarity values between items are
measured by observing all the users who
have interacted (rated) both the items
● E.x: Find a group of movies that is similar to
these set of movies that we know user X likes
15. Collaborative filtering: Similarity
(2)
● Manhattan distance: |x1 – x2| + |y1 - y2|
● X Y
●
User(x, y)
Amy(5, 5)
X Bill(2, 5)
Jim(1, 4)
Item(x1, x2, x3) → Ratings
Snow Crash(5, 2, 1)
Girl with the Dragon Tattoo (5, 5, 1)
Manhattan distance
→ Amy – Bill: |5 – 2| + |5 – 5| = 3
Y → Snow Crash - Girl with the Dragon Tattoo: 3
16. Collaborative filtering: Similarity
(3)
● Cosine distance: the angle between these
vectors. Value: -1 (no related) to 1
Item(x1, x2, x3) → Ratings
Snow Crash(5, 2, 1)
Girl with the Dragon Tattoo (5, 5, 1)
Cosine distance
→ Snow Crash - Girl with the Dragon Tattoo:
(5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1))
PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php
17. Collaborative filtering: Similarity
(4)
● Pearson Correlation Coefficient: from -1 (no
related) to +1
●
●
●
●
● How much the ratings by common users for a
pair of items deviate from average ratings for
those items
● Correlation is basically the average product
18. Collaborative filtering: Similarity
(5)
● Euclidean distance: the "ordinary" distance
between two points.
●
●
●
●
● Values: Near 0 (no related) to 1
19. Collaborative filtering: Similarity
(6)
● Spearman distance: Spearman distance is a
square of Euclidean Distance between two
rank vectors. A perfect positive correlation is
+1 and a perfect negative correlation is -1.
●
●
● Spearman Rank Correlation: The range of
Spearman Correlation is from -1 to 1 (a perfect
Spearman correlation of +1)
21. Collaborative filtering:
Recommendation computations
● Calculate similarity between Item A that user X
watch/buy/like with items that User X does not
watch/buy/like
● Score all the items (e.x: apply weighted
algorithms – average score by the other)
● Sorting
● Return top-N items
22. Collaborative filtering: Other
issues
● Accuracy of Predicting Ratings. To evaluate
accuracy when predicting unrated item for the active
user, use Mean Absolute Error (MAE).
● Accuracy of Recommendations. To evaluate the
accuracy of recommendations, use Mean Average
Precision (MAP), which is defined as Average of the
Average Precision (AP) value for a set of queries (a
query could be considered as a user’s asking for
recommending items in recommender systems).