SlideShare a Scribd company logo
1 of 40
BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE

Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            2
The Business Problem
  Design
        product recommender solution that will
 increase revenue.




               $$
                                            3
How Do We Increase Revenue?



               Increase
              Conversion
  Increase
  Revenue                      Increase
                              Unit Price
             Increase Avg.
              Order Value
                               Increase
                             Units / Order




                                             4
Example
  Is   this recommendation effective?


                        Increase
                       Unit Price



                        Increase
                      Units / Order




                                         5
What am I
going to do?




               6
Predictive Model
   Framework



                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



What data?      What feature?   Which Algorithm ?   Cross-sell & Up-sell
                                                    Recommendation




                                                                     7
What Data to Use?
  Explicit   data
       Ratings
       Comments
  Implicit   data
       Order history / Return history
       Cart events
       Page views
       Click-thru
       Search log
  In
    today’s talk we only use Order history and Cart
  events
                                                      8
Predictive Model


                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



Order History   What feature?   Which Algorithm ?   Cross-sell & Up-sell
Cart Events                                         Recommendation




                                                                     9
What Features to Use?
  We know that a given product tends to get
   purchased by customers with similar tastes or
   needs.
  Use user engagement data to describe a product.


                                  users
                1   2   3     4   5   6     7   8   9   10    …   n
    item




           17   1       .25           .25       1       .25


                         user engagement vector



                                                                      10
Data Representation / Features
  When we merge every item’s user engagement
 vector, we got a m x n item-user matrix
                                        users
              1   2     3     4     5   6   7     8     9   10    …   n

          1   1         .25             1                   .25

          2                                 .25
  items




          3   1               .25                 1
          4       .25               1             .25   1

                        1                   1
          …




          m



                                                                          11
Data Normalization
  Ensurethe magnitudes of the entries in the
 dataset matrix are appropriate
                                             users
               1     2     3     4     5     6     7     8     9     10    …   n

           1   1
               .5          1
                           .9                 1
                                             .92                      1
                                                                     .49

           2                                        1
                                                   .79
   items




           3    1
               .67                1
                                 .46                      1
                                                         .73

           4          1
                     .39                1
                                       .82                1
                                                         .76    1
                                                               .69

                            1                      1
           …
           …




                           .52                     .8

           m


  Removecolumn average – so frequent buyers
                                                                                   12
 don’t dominate the model
Data Normalization
  Differentengagement data points (Order / Cart /
   Page View) should have different weights
  Common normalization strategies:
      Remove column average
      Remove row average
      Remove global mean
      Z-score
      Fill-in the null values




                                                     13
Predictive Model


                                         ML                  Prediction
     Data          Features
                                      Algorithm               Output



Order History   User engagement      Which Algorithm ?   Cross-sell & Up-sell
Cart Events     vector                                   Recommendation



                Data Normalization




                                                                        14
Which Algorithm?
  How
     do we find the items that have similar user
 engagement data?
                                       users
               1   2   3     4   5      6   7   8     9   10    …   n

          1    1       .25              1                 1
          2                                 1
  items




          17   1             1          1       .25       .25

          18       1             .25    1       1     1

                                            1
          …




                       .25

          m


  We
    can find the items that have similar user
                                                                        15
 engagement vectors with kNN algorithm
k-Nearest Neighbor (kNN)
  Find
      the k items that have the most similar user
  engagement vectors
                                      users
              1     2   3    4   5     6   7    8   9    10   …   n

          1   .5        1              1                 1

          2         1                      .5            1
  items




          3   1              1                  1   1

          4         1            .5        1        1

                        .5                 1
          …




          m                  1                      .5



  Nearest         Neighbors of Item 4 = [2,3,1]                      16
Similarity Measure for kNN
                                                      users
                   1    2     3        4        5          6            7         8       9   10   …    n
       items




               2        1                                           .5                        1

               4        1                       .5                      1                 1

    Jaccard coefficient:
                                                                   (1+ 1)
                             sim(a,b) =
                                                     (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
    Cosine similarity:
                                  a•b                                                         (1*1+ 0.5 *1)
           sim(a,b) = cos(a,b) =                                    =
               €                 a ∗ b                                          (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
                                  2                            2

    Pearson Correlation:

€         corr(a,b) =
                             ∑ (r − r )(r − r )
                                  i    ai       a     bi            b
                                                                                      =
                                                                                                   m∑ aibi − ∑ ai ∑ bi

                            ∑ (r − r ) ∑ (r − r )
                              i   ai        a
                                                2
                                                       i       bi           b
                                                                                 2
                                                                                          m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
                                                                                                                                  17
                             match _ cols * Dotprod(a,b) − sum(a) * sum(b)
          =
               match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
k-Nearest Neighbor (kNN)
                                                       Item
feature
  space                                                Similarity Measure
                                                       (cosine similarity)
                                           7
                          9


              8                   2
          1
                                      4
                      6


                                               5

                  3




                              kNN k=5
                              Nearest Neighbors(8) = [9,6,3,1,2]       18
Predictive Model
   Ver.    1: kNN

                                              ML                   Prediction
     Data               Features
                                           Algorithm                Output



Order History        User engagement      k-Nearest Neighbor   Cross-sell & Up-sell
Cart Events          vector               (kNN)                Recommendation



                     Data Normalization




                                                                              19
Cosine Similarity – Code fragment
 long i_cnt = 100000; // number of items 100K
 long u_cnt = 2000000; // number of users 2M
 double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
 double norm[i_cnt];

 // assume data matrix is loaded
 ……
 // calculate vector norm for each user engagement vector
 for (i=0; i<i_cnt; i++) {
     norm[i] = 0;
     for (f=0; f<u_cnt; f++) {
        norm[i] += data[i][f] * data [i][f];
     }                               1. 100K rows x 100K rows x 2M features --> scalability problem
     norm[i] = sqrt(norm[i]);              kd-tree, Locality sensitive hashing,
 }
                                        MapReduce/Hadoop, Multicore/Threading, Stream Processors
 // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
 for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
                                        are
  for (j=0; j<i_cnt; j++) { // loop thru 100K
     dot_product = 0;
                                         This leads us to The SVD dimensionality reduction !
     for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
         dot_product += data[i][f] * data[j][f];
     }
     printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
   }                                                                                                      20
 // find the Top K nearest neighbors here
 …….
Singular Value Decomposition
            (SVD)
            A = U × S ×VT
                  A                             U                     S                     VT
             m x n matrix                   m x r matrix         r x r matrix           r x n matrix




€
    items




                                items



                                                                rank = k
                                                                k<r


               users                                                                      users
                                                                                            users
             Ak = U k × Sk × VkT
           Low rank approx. Item profile is        U k * Sk


                                                                                items
           Low rank approx. User profile is         S k *VkT                                          21

€          Low rank approx. Item-User matrix is
                                 €                         U k * Sk * Sk *VkT

                                        €
Reduced SVD
        Ak = U k × Sk × VkT
               Ak                          Uk                Sk                VkT
         100K x 2M matrix         100K x 3 matrix      3 x 3 matrix        3 x 2M matrix

                                                         7   0    0

                                                         0   3    0
items




                                   items
                                                         0   0    1              users
                                                         rank = 3
                                                                      Descending
                                                                      Singular Values


             users

       Low rank approx. Item profile is    U k * Sk

                                                                                           22

                              €
SVD Factor Interpretation                                  S
                                                     3 x 3 matrix

  Singular        values plot (rank=512)              7   0   0

                                                       0   3   0

                                                       0   0   1

                                                                    Descending
                                                                    Singular Values




                                                                                23
More Significant        Latent Factors   Noises + Others             Less Significant
SVD Dimensionality Reduction

                                  U k * Sk
        <----- latent factors ----->                     # of users


                        €
items




         3
               rank
                            Need to find the most optimal low rank !!
                10


                                                                        24
Missing values

    Difference between “0” and “unknown”
    Missing values do NOT appear randomly.
    Value = (Preference Factors) + (Availability) – (Purchased
     elsewhere) – (Navigation inefficiency) – etc.
    Approx. Value = (Preference Factors) +/- (Noise)
    Modeling missing values correctly will help us make good
     recommendations, especially when working with an extremely
     sparse data set




                                                                  25
Singular Value Decomposition
(SVD)
  Use SVD to reduce dimensionality, so neighborhood
   formation happens in reduced user space
  SVD helps model to find the low rank approx. dataset
   matrix, while retaining the critical latent factors and
   ignoring noise.
  Optimal low rank needs to be tuned
  SVD is computationally expensive


    SVD Libraries:
         Matlab [U, S, V] = svds(A,256);
         SVDPACKC http://www.netlib.org/svdpack/
         SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
         GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html   26
Predictive Model
   Ver.    2: SVD+kNN

                                           ML                   Prediction
     Data           Features
                                        Algorithm                Output



Order History     User engagement     k-Nearest Neighbors   Cross-sell & Up-sell
Cart Events            vector         (kNN) in reduced      Recommendation
                                      space


                 Data Normalization

                       SVD


                                                                           27
Synthetic Data Set
  Why   do we use synthetic data set?




  Sowe can test our new model in a controlled
  environment
                                                 28
Synthetic Data Set
  16latent factors synthetic e-commerce
  data set
      Dimension: 1,000 (items) by 20,000 (users)
      16 user preference factors
      16 item property factors (non-negative)
      Txn Set: n = 55,360 sparsity = 99.72 %
      Txn+Cart Set: n = 192,985 sparsity = 99.03%
      Download: http://www.IntelligentMining.com/dataset/

       user_id   item_id   type
       10        42        0.25
       10        997       0.25
       10        950       0.25                              29
       11        836       0.25
       11        225       1
Synthetic Data Set
Item property      User preference             Purchase Likelihood score
                                                           1K x 20K matrix
   factors             factors
 1K x 16 matrix      16 x 20K matrix
                                                    X11   X12   X13   X14   X15   X16
                     x
                                                    X21   X22   X12   X24   X25   X26
                     y




                                            items
                                                    X31   X32   X33   X34   X35   X36
 a    b     c        z
                                                    X41   X42   X43   X44   X45   X46

                                                    X51   X52   X53   X54   X55   X56

                                                                 users

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

X32 = Likelihood of Item 3 being purchased by User 2
                                                                                        30
Synthetic Data Set
X11                        X31                                   X51
                                 Based on the distribution,
                                 pre-determine # of items
X21                        X41   purchased by an user            X41
                                 (# of item=2)
X31     Sort by Purchase   X21   From the top, select and skip
                                                                 X31
        likelihood Score         certain items to create data
X41                        X51   sparsity.                       X21

X51                        X11                                   X11



      User 1 purchased Item 4 and Item 1




                                                                       31
Experiment Setup
  Each model (Random / kNN / SVD+kNN) will
   generate top 20 recommendations for each item.
  Compare model output to the actual top 20
   provided by synthetic data set
  Evaluation Metrics :
      Precision %: Overlapping of the top 20 between model
       output and actual (higher the better)
                     {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
       Precision =
                                 {Found _ Top20 _ items}

      Quality metric: Average of the actual ranking in the
       model output (lower the better)
       €                                                                              32

         1   2   30      47   50   21              1     2   368   62     900   510
Experimental Result
     kNN            vs. Random (Control)

Precision %                           Quality
(higher is better)                    (Lower is better)




                                                          33
Experimental Result
    Precision       % of SVD+kNN
Recall %
(higher is better)




                                    Improvement




                                                     34
                                          SVD Rank
Experimental Result
      Quality      of SVD+kNN
Quality
(Lower is better)




                                 Improvement




                                                    35
                                         SVD Rank
Experimental Result
    The       effect of using Cart data
Precision %
(higher is better)




                                                      36
                                           SVD Rank
Experimental Result
  The       effect of using Cart data
Quality
(Lower is better)




                                                    37
                                         SVD Rank
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            38
References
    J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
     Predictive Algorithms for Collaborative Filtering," in Proceedings of the
     Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
     1998), 1998.
    B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
     filtering recommendation algorithms," in Proceedings of the Tenth
     International Conference on the World Wide Web (WWW 10), pp. 285-295,
     2001.
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
     Dimensionality Reduction in Recommender System A Case Study" In
     ACM WebKDD 2000 Web Mining for E-Commerce Workshop
    Apache Lucene Mahout http://lucene.apache.org/mahout/
    Cofi: A Java-Based Collaborative Filtering Library
     http://www.nongnu.org/cofi/


                                                                                 39
Thank you
  Any   question or comment?




                                40

More Related Content

What's hot

Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation systemPranav Prakash
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetJagruti Joshi
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Mauryasuraj98
 

What's hot (20)

Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens Dataset
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 

Viewers also liked

Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCMLconf
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music IndustryData Science London
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRomain Lerallut
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization MachinesPavel Kalaidin
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2njit-ronbrown
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationrubyyc
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومfaradars
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFMLiangjie Hong
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsAladejubelo Oluwashina
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringDKALab
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 

Viewers also liked (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music Industry
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization Machines
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دوم
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrongJulien Coquet
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile MetricsSiddhi
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkDamian R. Mingle, MBA
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfData Science Council of America
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsApigee | Google Cloud
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIVijayananda Mohire
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine (20)

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile Metrics
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That Work
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
 
E miner
E minerE miner
E miner
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
 
Z suzanne van_den_bosch
Z suzanne van_den_boschZ suzanne van_den_bosch
Z suzanne van_den_bosch
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee Insights
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Lean startup
Lean startupLean startup
Lean startup
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AI
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

More from NYC Predictive Analytics

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsNYC Predictive Analytics
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMNYC Predictive Analytics
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionNYC Predictive Analytics
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsNYC Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionNYC Predictive Analytics
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeNYC Predictive Analytics
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 

More from NYC Predictive Analytics (10)

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive Models
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System Competition
 
R package Recommendation Engine
R package Recommendation EngineR package Recommendation Engine
R package Recommendation Engine
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for Prediction
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive Change
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine Demystified
 

Recently uploaded

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Building a Recommendation Engine - An example of a product recommendation engine

  • 1. BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com
  • 2. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 2
  • 3. The Business Problem   Design product recommender solution that will increase revenue. $$ 3
  • 4. How Do We Increase Revenue? Increase Conversion Increase Revenue Increase Unit Price Increase Avg. Order Value Increase Units / Order 4
  • 5. Example   Is this recommendation effective? Increase Unit Price Increase Units / Order 5
  • 6. What am I going to do? 6
  • 7. Predictive Model   Framework ML Prediction Data Features Algorithm Output What data? What feature? Which Algorithm ? Cross-sell & Up-sell Recommendation 7
  • 8. What Data to Use?   Explicit data   Ratings   Comments   Implicit data   Order history / Return history   Cart events   Page views   Click-thru   Search log   In today’s talk we only use Order history and Cart events 8
  • 9. Predictive Model ML Prediction Data Features Algorithm Output Order History What feature? Which Algorithm ? Cross-sell & Up-sell Cart Events Recommendation 9
  • 10. What Features to Use?   We know that a given product tends to get purchased by customers with similar tastes or needs.   Use user engagement data to describe a product. users 1 2 3 4 5 6 7 8 9 10 … n item 17 1 .25 .25 1 .25 user engagement vector 10
  • 11. Data Representation / Features   When we merge every item’s user engagement vector, we got a m x n item-user matrix users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 .25 2 .25 items 3 1 .25 1 4 .25 1 .25 1 1 1 … m 11
  • 12. Data Normalization   Ensurethe magnitudes of the entries in the dataset matrix are appropriate users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .5 1 .9 1 .92 1 .49 2 1 .79 items 3 1 .67 1 .46 1 .73 4 1 .39 1 .82 1 .76 1 .69 1 1 … … .52 .8 m   Removecolumn average – so frequent buyers 12 don’t dominate the model
  • 13. Data Normalization   Differentengagement data points (Order / Cart / Page View) should have different weights   Common normalization strategies:   Remove column average   Remove row average   Remove global mean   Z-score   Fill-in the null values 13
  • 14. Predictive Model ML Prediction Data Features Algorithm Output Order History User engagement Which Algorithm ? Cross-sell & Up-sell Cart Events vector Recommendation Data Normalization 14
  • 15. Which Algorithm?   How do we find the items that have similar user engagement data? users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 1 2 1 items 17 1 1 1 .25 .25 18 1 .25 1 1 1 1 … .25 m   We can find the items that have similar user 15 engagement vectors with kNN algorithm
  • 16. k-Nearest Neighbor (kNN)   Find the k items that have the most similar user engagement vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 .5 1 1 1 2 1 .5 1 items 3 1 1 1 1 4 1 .5 1 1 .5 1 … m 1 .5   Nearest Neighbors of Item 4 = [2,3,1] 16
  • 17. Similarity Measure for kNN users 1 2 3 4 5 6 7 8 9 10 … n items 2 1 .5 1 4 1 .5 1 1   Jaccard coefficient: (1+ 1) sim(a,b) = (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)   Cosine similarity: a•b (1*1+ 0.5 *1) sim(a,b) = cos(a,b) = = € a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 ) 2 2   Pearson Correlation: € corr(a,b) = ∑ (r − r )(r − r ) i ai a bi b = m∑ aibi − ∑ ai ∑ bi ∑ (r − r ) ∑ (r − r ) i ai a 2 i bi b 2 m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2 17 match _ cols * Dotprod(a,b) − sum(a) * sum(b) = match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
  • 18. k-Nearest Neighbor (kNN) Item feature space Similarity Measure (cosine similarity) 7 9 8 2 1 4 6 5 3 kNN k=5 Nearest Neighbors(8) = [9,6,3,1,2] 18
  • 19. Predictive Model   Ver. 1: kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell Cart Events vector (kNN) Recommendation Data Normalization 19
  • 20. Cosine Similarity – Code fragment long i_cnt = 100000; // number of items 100K long u_cnt = 2000000; // number of users 2M double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded …… // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; } 1. 100K rows x 100K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem are for (j=0; j<i_cnt; j++) { // loop thru 100K dot_product = 0; This leads us to The SVD dimensionality reduction ! for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here …….
  • 21. Singular Value Decomposition (SVD) A = U × S ×VT A U S VT m x n matrix m x r matrix r x r matrix r x n matrix € items items rank = k k<r users users users Ak = U k × Sk × VkT   Low rank approx. Item profile is U k * Sk items   Low rank approx. User profile is S k *VkT 21 €   Low rank approx. Item-User matrix is € U k * Sk * Sk *VkT €
  • 22. Reduced SVD Ak = U k × Sk × VkT Ak Uk Sk VkT 100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix 7 0 0 0 3 0 items items 0 0 1 users rank = 3 Descending Singular Values users   Low rank approx. Item profile is U k * Sk 22 €
  • 23. SVD Factor Interpretation S 3 x 3 matrix   Singular values plot (rank=512) 7 0 0 0 3 0 0 0 1 Descending Singular Values 23 More Significant Latent Factors Noises + Others Less Significant
  • 24. SVD Dimensionality Reduction U k * Sk <----- latent factors -----> # of users € items 3 rank Need to find the most optimal low rank !! 10 24
  • 25. Missing values   Difference between “0” and “unknown”   Missing values do NOT appear randomly.   Value = (Preference Factors) + (Availability) – (Purchased elsewhere) – (Navigation inefficiency) – etc.   Approx. Value = (Preference Factors) +/- (Noise)   Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25
  • 26. Singular Value Decomposition (SVD)   Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space   SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise.   Optimal low rank needs to be tuned   SVD is computationally expensive   SVD Libraries:   Matlab [U, S, V] = svds(A,256);   SVDPACKC http://www.netlib.org/svdpack/   SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/   GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
  • 27. Predictive Model   Ver. 2: SVD+kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell Cart Events vector (kNN) in reduced Recommendation space Data Normalization SVD 27
  • 28. Synthetic Data Set   Why do we use synthetic data set?   Sowe can test our new model in a controlled environment 28
  • 29. Synthetic Data Set   16latent factors synthetic e-commerce data set   Dimension: 1,000 (items) by 20,000 (users)   16 user preference factors   16 item property factors (non-negative)   Txn Set: n = 55,360 sparsity = 99.72 %   Txn+Cart Set: n = 192,985 sparsity = 99.03%   Download: http://www.IntelligentMining.com/dataset/ user_id item_id type 10 42 0.25 10 997 0.25 10 950 0.25 29 11 836 0.25 11 225 1
  • 30. Synthetic Data Set Item property User preference Purchase Likelihood score 1K x 20K matrix factors factors 1K x 16 matrix 16 x 20K matrix X11 X12 X13 X14 X15 X16 x X21 X22 X12 X24 X25 X26 y items X31 X32 X33 X34 X35 X36 a b c z X41 X42 X43 X44 X45 X46 X51 X52 X53 X54 X55 X56 users X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z X32 = Likelihood of Item 3 being purchased by User 2 30
  • 31. Synthetic Data Set X11 X31 X51 Based on the distribution, pre-determine # of items X21 X41 purchased by an user X41 (# of item=2) X31 Sort by Purchase X21 From the top, select and skip X31 likelihood Score certain items to create data X41 X51 sparsity. X21 X51 X11 X11   User 1 purchased Item 4 and Item 1 31
  • 32. Experiment Setup   Each model (Random / kNN / SVD+kNN) will generate top 20 recommendations for each item.   Compare model output to the actual top 20 provided by synthetic data set   Evaluation Metrics :   Precision %: Overlapping of the top 20 between model output and actual (higher the better) {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items} Precision = {Found _ Top20 _ items}   Quality metric: Average of the actual ranking in the model output (lower the better) € 32 1 2 30 47 50 21 1 2 368 62 900 510
  • 33. Experimental Result   kNN vs. Random (Control) Precision % Quality (higher is better) (Lower is better) 33
  • 34. Experimental Result   Precision % of SVD+kNN Recall % (higher is better) Improvement 34 SVD Rank
  • 35. Experimental Result   Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank
  • 36. Experimental Result   The effect of using Cart data Precision % (higher is better) 36 SVD Rank
  • 37. Experimental Result   The effect of using Cart data Quality (Lower is better) 37 SVD Rank
  • 38. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 38
  • 39. References   J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 1998), 1998.   B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 10), pp. 285-295, 2001.   B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop   Apache Lucene Mahout http://lucene.apache.org/mahout/   Cofi: A Java-Based Collaborative Filtering Library http://www.nongnu.org/cofi/ 39
  • 40. Thank you   Any question or comment? 40