SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
People who liked this talk also liked …
Building Recommendation Systems
             Using Ruby

            Ryan Weald, @rweald
             LA RubyConf 2013




                                          1
Who is this guy?

 What does he know
about recommendation
       systems?

                       2
Data Scientist @Sharethrough




 Native advertising
     platform
                               3
4
Outline
1) What is a recommendation system?
2) Collaborative filtering based
   recommendations
3) Content based recommendations
4) Hybrid systems - the best of both worlds
5) Evaluating your recommendation system
6) Resources & existing libraries


                                              5
What this Talk is Not
• Everything there is to know about
  recommendation systems.
• Bleeding edge machine learning
• How to use a specific library




                                      6
What is a
recommendation system?



                         7
A program that predicts
a user’s preferences using information
 about the user, other users, and the
         items in your system.




                                         8
LinkedIn




           9
Netflix




         10
Spotify




          11
Amazon




         12
How do I build
recommendations?



                   13
Two Main Categories of Algorithm



1. Collaborative Filtering (CF)

2. Content Based - Classification




                                   14
Collaborative Filtering


Fill in missing user preferences using
         similar users or items




                                         15
Two Types of CF
1. Memory Based - Uses similarity
between users or items. Dataset
usually kept in memory

2. Model Based - Model generated
to “explain” observed ratings


                                    16
User Based CF


 (User x Item) Matrix + Similarity
Function = Top-K most similar users




                                      17
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               18
Similarity Functions

• Pearson Correlation Coefficient
• Cosine Similarity




                                   19
Pearson Correlation Coefficient




                                 20
Calculating PCC




                  21
Calculating PCC




                  22
Calculating PCC




                  23
Calculating PCC




                  24
Calculating PCC




                  25
Calculating PCC




                  26
27
Using similarity to
recommend items



                      28
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               29
30
Problems With CF

• Cold Start
• Data Sparsity
• Resource expensive



                        31
Doesn’t the video
content matter for
recommendations?


                     32
Content Based Recommendations


  Classify items based on features of
   the item. Pick other items from
      same class to recommend.




                                        33
Content Based Algorithms
• K-means clustering
• Random Forrest
• Support Vector Machines
• ...
• Insert your favorite ML algorithm

                                      34
Content Based Algorithms
          Type of    Duration   Maturity
          content                Rating
Video 1   comedy        60         G

Video 2    action      120         G

Video 3   comedy        34      PG-13

Video 4   romantic      15         R

Video 5    sports      120         G




                                           35
K-means Clustering


  Group items into K clusters.
Assign new item to a cluster and
  pick items from that cluster




                                   36
K-means Clustering




                     37
Problems With Content Based
      Recommendations

• Unsupervised Learning is hard
• Training data limited or expensive
• Doesn’t take user into account
• Limited by features of content

                                       38
Hybrid Recommendations


Combine collaborative filtering with
content based algorithm to achieve
          greater results




                                      39
Hybrid Recommendations

Input
           CF Based
         Recommender

                         Combiner   Reco


Input
         Content Based
         Recommender




                                           40
Hybrid Recommendations




                         41
Hybrid Recommendations



            Content         CF
Input                                 Reco
          Recommender   Recommender




                                             42
Hybrid Recommendations


            CF
        Recommender
Input                        Reco
          Content
        Recommender




                                    43
Evaluating Recommendation Quality


• Precision vs. Recall
• Clicks
• Click through rate
• Direct user feedback


                                    44
Precision vs. Recall




                       45
Precision vs. Recall




                       46
Summary of What We’ve Learned


 • Collaborative Filtering using similar users
 • Content clustering using k-means
 • Combining 2 algorithms to boost quality
 • How to evaluate your recommender


                                                 47
Don’t Reinvent the Wheel

• Apache Mahout
• JRuby mahout gem
• SciRuby
• Recommenderlab for R


                             48
Resources & Further Reading
• Recommender Systems: An Introduction
• Linden, Greg, Brent Smith, and Jeremy York.
"Amazon. com recommendations: Item-to-item
collaborative filtering."
• Resnick, Paul, et al. "GroupLens: an open architecture
for collaborative filtering of netnews."
• ACM RecSys Conference Proceedings


                                                           49
We’re Hiring
http://bit.ly/str-engineering




                                50
Thanks!
        Twitter: @rweald
Email: ryan@sharethrough.com




                               51

Mais conteúdo relacionado

Semelhante a People who liked this talk also liked … Building Recommendation Systems Using Ruby

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010
Eric Schwartzman
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
guest3b9e35d
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modules
neilbowers
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprises
Jose Santos
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
Giannis Tsakonas
 

Semelhante a People who liked this talk also liked … Building Recommendation Systems Using Ruby (20)

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
 
Code review in practice
Code review in practiceCode review in practice
Code review in practice
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modules
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit Testing
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprises
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually Want
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysis
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

People who liked this talk also liked … Building Recommendation Systems Using Ruby