SlideShare uma empresa Scribd logo
1 de 31
Usman Sharif

RECOMMENDATION SYSTEMS
Why recommendation systems?

 Provide a better experience to your users.
 Understand the behavior and patterns of
  users.
 Enables an opportunity to re-engage inactive
  users.
 Boost sales
 Better than a search feature
How some companies are using
Recommendation Systems - Amazon
How some companies are using
Recommendation Systems - Gmail
A simple recommendation system

 Consider the following scenario
   A library has books and has members
   Members can have books issued
   The library wants to build a recommender system
    to recommend books to their members
Scoring Matrices
         Book 1   Book 2   Book 3   Book 4
User 1   X                 X
User 2   X
User 3            X                 X
User 4   X                 X        X
User 5   X        X

         Book 1   Book 2   Book 3   Book 4
Book 1   4        1        2        1
Book 2   1        2        0        1
Book 3   2        0        2        1
Book 4   1        1        1        2
Using the scoring matrices

 If a user has read Book 1 recommend Book 3, 2, 4.
 If a user has read Book 2 recommend Book 1, 4, 3.
 If a user has read Book 3 recommend Book 1, 4, 2.
 If a user has read Book 4 recommend Book 1, 2, 3.
Advantages

 Very simple to understand and implement.
 Works really well if you’re interested in
  looking at user’s one activity to recommend
  further.
Disadvantages

 Cannot work for a new user with no history.
 In a real world scenario where there are
  thousands of books and thousands of
  members, there are bound to be too many
  zeroes (a sparse matrix).
 Does not consider more than 1 item.
Another Try
 Our Books records might look like this:
BookId Title                     Genre         Writer               Language
1       The Great Gatsby         Classic       F Scott Fitzgerald   English
2       Nine Stories             Short Stories J D Salinger         English
3       The Sun Also Rises       Classic       Ernest Hemingway English
4       The Hunger Games         Action        Suzanne Collins      English
5       The Ambler Warning       Thriller      Robert Ludlum        English
6       The Catcher in the Rye   Classic       J D Salinger         English
7       To Kill a Mockingbird    Classic       Harper Lee           English
Create an Item Similarity
   Matrix
            Book 1     Book 2      Book 3     Book 4      Book 5     Book 6      Book 7
Book 1      3          1           2          1           1          2           2
Book 2      1          3           1          1           1          2           1
Book 3      2          1           3          1           1          2           2
Book 4      1          1           1          3           1          1           1
Book 5      1          1           1          1           3          1           1
Book 6      2          2           2          1           1          3           2
Book 7      2          1           2          1           1          2           3
• This would always be a square (n x n) matrix.
• Each cell has the count of similar attributes (excluding unique attributes).
• In general any measure for similarity can be used here.
To Recommend

 Look at what a user has previously read.
 Use the values from the similarity matrix and
  recommend books based on how similar it is
  to the book the user has already read.
Advantages

 Recommendations can be pre-computed for
  a very large Item base.
 Fast lookups can be built to perform
  recommendations.
 For example, if a user is seeing the page of
  Book 3, you may want to recommend them
  Books 1, 6 and 7.
 Would work for new/non-registered users.
Disadvantage

 Does not consider the user’s history.
 Instead looks at a collective trend.
Another Approach - The Users

 Our Users records might look like this:
 UserId     Gender    Age        Location
 1          Male      34         Pakistan
 2          Female    28         Pakistan
 3          Male      38         India
 4          Male      32         India
 5          Female    21         Pakistan
 6          Female    24         Pakistan
The User Borrowing
  UserId   BookId
  1        3
  1        7
  2        2
  3        1
  3        5
  3        7
  4        6
  4        7
  5        2
  6        4
  6        6
  6        7
Transforming User Borrowing
             User 1     User 2       User 3   User 4   User 5   User 6
   Book 1                            X
   Book 2               X                              X
   Book 3    X
   Book 4                                                       X
   Book 5                            X
   Book 6                                     X                 X
   Book 7    X                       X        X                 X


• Issue with too many zero values.
• Any solutions?
Transform the Users Records

 Consider Age as a discrete column with
  ranges like {0-10, 11-20, 21-30, 31-40, …} so
  that we can create some partitions like this:
  PartitionId   Gender   AgeGroup   Location
  1             Male     31-40      Pakistan
  2             Female   21-30      Pakistan
  3             Male     31-40      India
Recreate User Borrowing using
  Partition Information
 Lesser zero valued records (11/21 compared to
  30/42 previously)
 Much less columns than we previously had!
 The notation has been changed from ‘X’ to
  count.                  Partition 1 Partition 2 Partition 3
                         Book 1                      1
                         Book 2            2
                         Book 3   1
                         Book 4            1
                         Book 5                      1
                         Book 6            1         1
                         Book 7   1        1         2
To Recommend

 See what partition a user belongs to.
 Look at the column of that partition and sort
  the books in descending order based on their
  frequency count.
Advantages

 Continues to improve over time.
 More partitions can be added over time.
 Instead of using a collective scoring, the
  technique partitions the user base into
  ‘similar’ users.
 The technique can easily be extended on the
  item side and rather than having books as
  rows, we can have book clusters.
Disadvantages

 Needs some seed data to start.
 Requires some transformations.
 Can become very complex as the number of
  users/items grow.
Evaluating Performance
(Metrics)
 Almost any Information Retrieval metric can
  be used.
 Three interesting ones:
   Accuracy
   Coverage
   Normalized Distance Based Performance Measure
    (NDPM)
Accuracy
• Takes into account the order in which recommendations are
  shown to users and how they responded to them.
• For rank position = 1:
   • Acc(1) = # of Positive responses with rank less than or
      equal to 1 / total recommendations with rank less than or
      equal to 1
   • Therefore, Acc(1) = 1 / 3 = 33.33%
• Similarly, Acc(2) = 2 / 6 = 33.33%
                        UserId     BookId    Rank       Response
                        1          3         1          Yes
                        1          2         2          No
                        2          7         1          No
                        2          5         2          Yes
                        3          3         1          No
                        3          7         2          No
Coverage
 Shows the coverage of items that appear in the
  recommendations for all users.
 For rank position = 1:
   Cov(1) = Unique items in recommendations with rank less
    than or equal to 1 / total items.
   Therefore, Cov(1) = 2 / 7 = 28.57%
 Similarly, Cov(2) = 4 / 7 = 57.14%
                      UserId     BookId   Rank      Response
                      1          3        1         Yes
                      1          2        2         No
                      2          7        1         No
                      2          5        2         Yes
                      3          3        1         No
                      3          7        2         No
Normalized Distance Based Performance
    Measure (NDPM)
   Assesses the quality of the measure of recommendation system taking into account the
    ordering in which items are shown.
   NDPM = (C- + 0.5 x C+) / Cu
   C- - is the number of recommended item pairs where user responded as (No, Yes).
   C+ - is the number of recommended item pairs where user responded as (Yes, No).
   Cu - is the number of all item pairs where the user’s response was not same.
   In our example,
       C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%
       C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%
       NDPM = (0.75 + 0.5) / 2 = 62.5%
                                              UserId                 BookId       Rank   Response
                                              1                      3            1      Yes
                                              1                      2            2      No
                                              1                      7            3      No
                                              1                      5            4      Yes
                                              2                      3            1      Yes
                                              2                      7            2      No
How to improve results

 Ensure that you maintain a list of already
  seen recommendations for users and don’t
  recommend them back for some time.
 Provide some sort of mechanism to user to
  provide information about what they’re
  looking for.
 Infer the above from user searches.
Some standard algorithms
 Item Hierarchy
      You bought a printer, you will also need ink.
 Attribute-based recommendations
      You like reading classics, written by Salinger, you might like “Catcher in
       the Rye”.
 Collaborative Filtering – User-User Similarity
      People like you who read “The Hunger Games” also read “The Ambler
       Warning”.
 Collaborative Filtering – Item-Item Similarity
      You like “Catcher in the Rye” so you will like “Nine Stories”.
 Social + Interest Graph Based
      Your friends like “The Great Gatsby” so you will like “The Great Gatsby”
       too.
 Model Based
      Training SVM, LDA, SVD for implicit features.
Some Tools

 Apache Mahout (Java)


 Crab (Python)


 Easyrec (RESTful API)
Questions??
Thankyou!

            www.usman-sharif.com
                  @sharif_usman

Mais conteúdo relacionado

Semelhante a Why Use Recommendation Systems to Boost Sales and Engagement

Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndicThreads
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfssuser4c50a9
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for TargetingMarcelo Salup
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1bweldon
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationBoundless
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingJagadeesh Gorla
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test FormatBrightLink Prep
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visualsvcuniversity
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFYusuke Yamamoto
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis WorkflowJonathanEarley3
 
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopAmanda Stockwell
 

Semelhante a Why Use Recommendation Systems to Boost Sales and Engagement (20)

NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
 
Memo Raft
Memo RaftMemo Raft
Memo Raft
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Tinderbook
Tinderbook  Tinderbook
Tinderbook
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for Targeting
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information Matching
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test Format
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visuals
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Rubric sample
Rubric sampleRubric sample
Rubric sample
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
 
Empowering Students Unit
Empowering Students UnitEmpowering Students Unit
Empowering Students Unit
 
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA Workshop
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Why Use Recommendation Systems to Boost Sales and Engagement

  • 2. Why recommendation systems?  Provide a better experience to your users.  Understand the behavior and patterns of users.  Enables an opportunity to re-engage inactive users.  Boost sales  Better than a search feature
  • 3. How some companies are using Recommendation Systems - Amazon
  • 4. How some companies are using Recommendation Systems - Gmail
  • 5. A simple recommendation system  Consider the following scenario  A library has books and has members  Members can have books issued  The library wants to build a recommender system to recommend books to their members
  • 6. Scoring Matrices Book 1 Book 2 Book 3 Book 4 User 1 X X User 2 X User 3 X X User 4 X X X User 5 X X Book 1 Book 2 Book 3 Book 4 Book 1 4 1 2 1 Book 2 1 2 0 1 Book 3 2 0 2 1 Book 4 1 1 1 2
  • 7. Using the scoring matrices  If a user has read Book 1 recommend Book 3, 2, 4.  If a user has read Book 2 recommend Book 1, 4, 3.  If a user has read Book 3 recommend Book 1, 4, 2.  If a user has read Book 4 recommend Book 1, 2, 3.
  • 8. Advantages  Very simple to understand and implement.  Works really well if you’re interested in looking at user’s one activity to recommend further.
  • 9. Disadvantages  Cannot work for a new user with no history.  In a real world scenario where there are thousands of books and thousands of members, there are bound to be too many zeroes (a sparse matrix).  Does not consider more than 1 item.
  • 10. Another Try  Our Books records might look like this: BookId Title Genre Writer Language 1 The Great Gatsby Classic F Scott Fitzgerald English 2 Nine Stories Short Stories J D Salinger English 3 The Sun Also Rises Classic Ernest Hemingway English 4 The Hunger Games Action Suzanne Collins English 5 The Ambler Warning Thriller Robert Ludlum English 6 The Catcher in the Rye Classic J D Salinger English 7 To Kill a Mockingbird Classic Harper Lee English
  • 11. Create an Item Similarity Matrix Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7 Book 1 3 1 2 1 1 2 2 Book 2 1 3 1 1 1 2 1 Book 3 2 1 3 1 1 2 2 Book 4 1 1 1 3 1 1 1 Book 5 1 1 1 1 3 1 1 Book 6 2 2 2 1 1 3 2 Book 7 2 1 2 1 1 2 3 • This would always be a square (n x n) matrix. • Each cell has the count of similar attributes (excluding unique attributes). • In general any measure for similarity can be used here.
  • 12. To Recommend  Look at what a user has previously read.  Use the values from the similarity matrix and recommend books based on how similar it is to the book the user has already read.
  • 13. Advantages  Recommendations can be pre-computed for a very large Item base.  Fast lookups can be built to perform recommendations.  For example, if a user is seeing the page of Book 3, you may want to recommend them Books 1, 6 and 7.  Would work for new/non-registered users.
  • 14. Disadvantage  Does not consider the user’s history.  Instead looks at a collective trend.
  • 15. Another Approach - The Users  Our Users records might look like this: UserId Gender Age Location 1 Male 34 Pakistan 2 Female 28 Pakistan 3 Male 38 India 4 Male 32 India 5 Female 21 Pakistan 6 Female 24 Pakistan
  • 16. The User Borrowing UserId BookId 1 3 1 7 2 2 3 1 3 5 3 7 4 6 4 7 5 2 6 4 6 6 6 7
  • 17. Transforming User Borrowing User 1 User 2 User 3 User 4 User 5 User 6 Book 1 X Book 2 X X Book 3 X Book 4 X Book 5 X Book 6 X X Book 7 X X X X • Issue with too many zero values. • Any solutions?
  • 18. Transform the Users Records  Consider Age as a discrete column with ranges like {0-10, 11-20, 21-30, 31-40, …} so that we can create some partitions like this: PartitionId Gender AgeGroup Location 1 Male 31-40 Pakistan 2 Female 21-30 Pakistan 3 Male 31-40 India
  • 19. Recreate User Borrowing using Partition Information  Lesser zero valued records (11/21 compared to 30/42 previously)  Much less columns than we previously had!  The notation has been changed from ‘X’ to count. Partition 1 Partition 2 Partition 3 Book 1 1 Book 2 2 Book 3 1 Book 4 1 Book 5 1 Book 6 1 1 Book 7 1 1 2
  • 20. To Recommend  See what partition a user belongs to.  Look at the column of that partition and sort the books in descending order based on their frequency count.
  • 21. Advantages  Continues to improve over time.  More partitions can be added over time.  Instead of using a collective scoring, the technique partitions the user base into ‘similar’ users.  The technique can easily be extended on the item side and rather than having books as rows, we can have book clusters.
  • 22. Disadvantages  Needs some seed data to start.  Requires some transformations.  Can become very complex as the number of users/items grow.
  • 23. Evaluating Performance (Metrics)  Almost any Information Retrieval metric can be used.  Three interesting ones:  Accuracy  Coverage  Normalized Distance Based Performance Measure (NDPM)
  • 24. Accuracy • Takes into account the order in which recommendations are shown to users and how they responded to them. • For rank position = 1: • Acc(1) = # of Positive responses with rank less than or equal to 1 / total recommendations with rank less than or equal to 1 • Therefore, Acc(1) = 1 / 3 = 33.33% • Similarly, Acc(2) = 2 / 6 = 33.33% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 25. Coverage  Shows the coverage of items that appear in the recommendations for all users.  For rank position = 1:  Cov(1) = Unique items in recommendations with rank less than or equal to 1 / total items.  Therefore, Cov(1) = 2 / 7 = 28.57%  Similarly, Cov(2) = 4 / 7 = 57.14% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 26. Normalized Distance Based Performance Measure (NDPM)  Assesses the quality of the measure of recommendation system taking into account the ordering in which items are shown.  NDPM = (C- + 0.5 x C+) / Cu  C- - is the number of recommended item pairs where user responded as (No, Yes).  C+ - is the number of recommended item pairs where user responded as (Yes, No).  Cu - is the number of all item pairs where the user’s response was not same.  In our example,  C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%  C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%  NDPM = (0.75 + 0.5) / 2 = 62.5% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 1 7 3 No 1 5 4 Yes 2 3 1 Yes 2 7 2 No
  • 27. How to improve results  Ensure that you maintain a list of already seen recommendations for users and don’t recommend them back for some time.  Provide some sort of mechanism to user to provide information about what they’re looking for.  Infer the above from user searches.
  • 28. Some standard algorithms  Item Hierarchy  You bought a printer, you will also need ink.  Attribute-based recommendations  You like reading classics, written by Salinger, you might like “Catcher in the Rye”.  Collaborative Filtering – User-User Similarity  People like you who read “The Hunger Games” also read “The Ambler Warning”.  Collaborative Filtering – Item-Item Similarity  You like “Catcher in the Rye” so you will like “Nine Stories”.  Social + Interest Graph Based  Your friends like “The Great Gatsby” so you will like “The Great Gatsby” too.  Model Based  Training SVM, LDA, SVD for implicit features.
  • 29. Some Tools  Apache Mahout (Java)  Crab (Python)  Easyrec (RESTful API)
  • 31. Thankyou! www.usman-sharif.com @sharif_usman