2. Why recommendation systems?
Provide a better experience to your users.
Understand the behavior and patterns of
users.
Enables an opportunity to re-engage inactive
users.
Boost sales
Better than a search feature
5. A simple recommendation system
Consider the following scenario
A library has books and has members
Members can have books issued
The library wants to build a recommender system
to recommend books to their members
6. Scoring Matrices
Book 1 Book 2 Book 3 Book 4
User 1 X X
User 2 X
User 3 X X
User 4 X X X
User 5 X X
Book 1 Book 2 Book 3 Book 4
Book 1 4 1 2 1
Book 2 1 2 0 1
Book 3 2 0 2 1
Book 4 1 1 1 2
7. Using the scoring matrices
If a user has read Book 1 recommend Book 3, 2, 4.
If a user has read Book 2 recommend Book 1, 4, 3.
If a user has read Book 3 recommend Book 1, 4, 2.
If a user has read Book 4 recommend Book 1, 2, 3.
8. Advantages
Very simple to understand and implement.
Works really well if you’re interested in
looking at user’s one activity to recommend
further.
9. Disadvantages
Cannot work for a new user with no history.
In a real world scenario where there are
thousands of books and thousands of
members, there are bound to be too many
zeroes (a sparse matrix).
Does not consider more than 1 item.
10. Another Try
Our Books records might look like this:
BookId Title Genre Writer Language
1 The Great Gatsby Classic F Scott Fitzgerald English
2 Nine Stories Short Stories J D Salinger English
3 The Sun Also Rises Classic Ernest Hemingway English
4 The Hunger Games Action Suzanne Collins English
5 The Ambler Warning Thriller Robert Ludlum English
6 The Catcher in the Rye Classic J D Salinger English
7 To Kill a Mockingbird Classic Harper Lee English
11. Create an Item Similarity
Matrix
Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7
Book 1 3 1 2 1 1 2 2
Book 2 1 3 1 1 1 2 1
Book 3 2 1 3 1 1 2 2
Book 4 1 1 1 3 1 1 1
Book 5 1 1 1 1 3 1 1
Book 6 2 2 2 1 1 3 2
Book 7 2 1 2 1 1 2 3
• This would always be a square (n x n) matrix.
• Each cell has the count of similar attributes (excluding unique attributes).
• In general any measure for similarity can be used here.
12. To Recommend
Look at what a user has previously read.
Use the values from the similarity matrix and
recommend books based on how similar it is
to the book the user has already read.
13. Advantages
Recommendations can be pre-computed for
a very large Item base.
Fast lookups can be built to perform
recommendations.
For example, if a user is seeing the page of
Book 3, you may want to recommend them
Books 1, 6 and 7.
Would work for new/non-registered users.
15. Another Approach - The Users
Our Users records might look like this:
UserId Gender Age Location
1 Male 34 Pakistan
2 Female 28 Pakistan
3 Male 38 India
4 Male 32 India
5 Female 21 Pakistan
6 Female 24 Pakistan
17. Transforming User Borrowing
User 1 User 2 User 3 User 4 User 5 User 6
Book 1 X
Book 2 X X
Book 3 X
Book 4 X
Book 5 X
Book 6 X X
Book 7 X X X X
• Issue with too many zero values.
• Any solutions?
18. Transform the Users Records
Consider Age as a discrete column with
ranges like {0-10, 11-20, 21-30, 31-40, …} so
that we can create some partitions like this:
PartitionId Gender AgeGroup Location
1 Male 31-40 Pakistan
2 Female 21-30 Pakistan
3 Male 31-40 India
19. Recreate User Borrowing using
Partition Information
Lesser zero valued records (11/21 compared to
30/42 previously)
Much less columns than we previously had!
The notation has been changed from ‘X’ to
count. Partition 1 Partition 2 Partition 3
Book 1 1
Book 2 2
Book 3 1
Book 4 1
Book 5 1
Book 6 1 1
Book 7 1 1 2
20. To Recommend
See what partition a user belongs to.
Look at the column of that partition and sort
the books in descending order based on their
frequency count.
21. Advantages
Continues to improve over time.
More partitions can be added over time.
Instead of using a collective scoring, the
technique partitions the user base into
‘similar’ users.
The technique can easily be extended on the
item side and rather than having books as
rows, we can have book clusters.
22. Disadvantages
Needs some seed data to start.
Requires some transformations.
Can become very complex as the number of
users/items grow.
23. Evaluating Performance
(Metrics)
Almost any Information Retrieval metric can
be used.
Three interesting ones:
Accuracy
Coverage
Normalized Distance Based Performance Measure
(NDPM)
24. Accuracy
• Takes into account the order in which recommendations are
shown to users and how they responded to them.
• For rank position = 1:
• Acc(1) = # of Positive responses with rank less than or
equal to 1 / total recommendations with rank less than or
equal to 1
• Therefore, Acc(1) = 1 / 3 = 33.33%
• Similarly, Acc(2) = 2 / 6 = 33.33%
UserId BookId Rank Response
1 3 1 Yes
1 2 2 No
2 7 1 No
2 5 2 Yes
3 3 1 No
3 7 2 No
25. Coverage
Shows the coverage of items that appear in the
recommendations for all users.
For rank position = 1:
Cov(1) = Unique items in recommendations with rank less
than or equal to 1 / total items.
Therefore, Cov(1) = 2 / 7 = 28.57%
Similarly, Cov(2) = 4 / 7 = 57.14%
UserId BookId Rank Response
1 3 1 Yes
1 2 2 No
2 7 1 No
2 5 2 Yes
3 3 1 No
3 7 2 No
26. Normalized Distance Based Performance
Measure (NDPM)
Assesses the quality of the measure of recommendation system taking into account the
ordering in which items are shown.
NDPM = (C- + 0.5 x C+) / Cu
C- - is the number of recommended item pairs where user responded as (No, Yes).
C+ - is the number of recommended item pairs where user responded as (Yes, No).
Cu - is the number of all item pairs where the user’s response was not same.
In our example,
C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%
C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%
NDPM = (0.75 + 0.5) / 2 = 62.5%
UserId BookId Rank Response
1 3 1 Yes
1 2 2 No
1 7 3 No
1 5 4 Yes
2 3 1 Yes
2 7 2 No
27. How to improve results
Ensure that you maintain a list of already
seen recommendations for users and don’t
recommend them back for some time.
Provide some sort of mechanism to user to
provide information about what they’re
looking for.
Infer the above from user searches.
28. Some standard algorithms
Item Hierarchy
You bought a printer, you will also need ink.
Attribute-based recommendations
You like reading classics, written by Salinger, you might like “Catcher in
the Rye”.
Collaborative Filtering – User-User Similarity
People like you who read “The Hunger Games” also read “The Ambler
Warning”.
Collaborative Filtering – Item-Item Similarity
You like “Catcher in the Rye” so you will like “Nine Stories”.
Social + Interest Graph Based
Your friends like “The Great Gatsby” so you will like “The Great Gatsby”
too.
Model Based
Training SVM, LDA, SVD for implicit features.