SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Kris Jack and Maya Hristakeva
16/12/2014
Modern Perspectives on
Recommender Systems and their
Applications in Mendeley
Kris Jack, Chief Data Scientist
http://www.mendeley.com/profiles/kris-jack/
Maya Hristakeva, Senior Data Scientist
http://www.mendeley.com/profiles/maya-hristakeva/
Phil Gooch, Senior Data Scientist
http://www.mendeley.com/profiles/phil-gooch/
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
What is a recommender?
A recommendation system (recommender) is a push system that presents
users with the most relevant content for their context and needs
• helps users to deal with information overload
• recommenders are complementary to search
search engine
pull
recommendation engine
push
request
infers context
and needs
Information
Retrieval
Information
Filtering
Recommenders @ Linkedin
50% of LinkedIn connections are from recommendations
Recommenders @ Linkedin
Recommenders @ Netflix
Stop 1% of users from cancelling subscription = $500M/year
Netflix invests $150M/year (300 people) in their content rec team
Recommenders @ ResearchGate
Why recommenders?
• Search and recommendations are complementary, have arms and legs!
• Higher usability, user satisfaction and engagement
• Increase product stickiness
• Monetise them
...and in the context of research...
Help researchers keep up-to-date with latest research, connect with
researchers in their field, contextualise their work within the global body of
research (articles, researchers, conferences, research groups, etc.)
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
Evolution of recommender problem
Problem: We have a massive collection of items (e.g. > 1 million).
We want to recommend 5 items that the user will like.
Evolution of recommender problem
First, seen as a ratings prediction problem. So, given some knowledge of the
user, estimate how much they will appreciate each item on scale of 1-5.
4.9
choose top 5 items with highest predicted ratings
4.7 4.7 4.6 4.5
Evolution of recommender problem
But do predicted ratings give the best order? Improve the recommender by
reranking a selection of items with high predicted ratings.
rerank items that are highly predicted
4.7 4.9 4.6 4.6 4.8
Evolution of recommender problem
Let’s improve the recommendations by optimizing the page in which they
appear.
deliver them in style
Evolution of recommender problem
Take the user’s context into account.
new to this
topic?
yesno
Evolution of recommender problem
Actively researching how to take other properties into account in context:
trustworthiness; freshness; diversity; serendipity; novelty; recency.
at work? yesno
Rating prediction
Reranking
Page optimisation
Context-aware
Future: trustworthiness; freshness; diversity; serendipity; novelty; recency.
How to make recommendations?
On to the algorithms...
Evolution of recommender problem
time
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
Recommender algorithms
A recommender processes information and transforms it into actionable
knowledge. Here we’ll focus on the algorithms that make this possible.
information flow (components often built in parallel)
Recommender algorithms
• Collaborative filtering (similarity and model-based)
• Content-based filtering
• Hybrid
• Non-traditional
Collaborative filtering
Formal representation
• User-based CF finds users who have similar appreciations for items as
you and recommends new items based on what they like.
• Item-based CF finds items that are similar to the ones you like.
Similarity is based on item cooccurrences (e.g. the users who bought x
also bought y).
Similarity-based CF
• ti
: rating of user xi
for item yi
.
• Infer prediction function
Collaborative filtering
Formal representation of MF
• X: user-item ratings matrix
• U: user-latent factors matrix
• S: latent factor diagonal matrix
• V: latent factor-item matrix
• Matrix Factorisation (SVD++)
• Clustering (K-means to LDA)
• LSH (Locality sensitive hashing)
• Restricted Boltzmann Machines
Model-based CF
Collaborative filtering
Pros
• Minimal domain knowledge
required
• User and item features are not
required
• Produces good enough results
in most cases
• Cold start problem
• Requires high user:item ratio (1:
10)
• Needs standardised products
• Popularity bias (doesn’t play
well with the long tail)
Cons
• User-based CF
• Item-based CF
• Model-based CF
Content-based filtering
• Determine item similarity based on item content not usage data
• Recommend items similar to those that a user is known to like
• The user model:
• explicitly provided features/keywords of interest
• can be a classifier (e.g Naive Bayes, SVM, Decision trees)
Formal representation
• ti
: rating of user xi
for item yi
, where xi
and yi
are feature vectors
• Infer prediction function
Content-based filtering
Pros
• No cold start problem
• No need for usage data
• No popularity bias, can
recommend items with rare
features
• Item content needs to be
machine readable and meaningful
• Easy to pigeonhole the user
• Difficult to implement serendipity
• Difficult to combine multiple item’
s features together
Cons
• Determine item similarity based on item content not usage data
• Recommend items similar to those that a user is known to like
• The user model:
• explicitly provided features/keywords of interest
• can be a classifier (e.g Naive Bayes, SVM, Decision trees)
Hybrid approaches
Method Description
Weighted Outputs from several techniques (in the form of scores or votes) are combined
with different degrees of importance to offer final recommendations
Switching Depending on situation, the system changes from one technique to another
Mixed Recommendations from several techniques are presented at the same time
Feature combination Features from different recommendation sources are combined as input to a
single technique
Cascade The output from one technique is used as input of another that refines the
result
Feature augmentation The output from one technique is used as input features to another
Meta-level The model learned by one recommender is used as input to another
Hybrid approaches
Combining user and item features and usage to benefit from both
Pros
• Often outperforms CF and CB
alone
Cons
• Can be a lot of work to get the
right balance
Non-traditional approaches
• Deep learning
• Social recommendations
• Learning to rank
• ...
Pros Cons
• Good for eking out those final
performance percentage points
• You can say you’re working with
current edge approaches ;)
• Less well understood
• Less supported in
recommendation toolkits
• Not recommended approaches
for your first recommender
Is your recommender doing well?
• Typically employ collaborative filtering
• May need to use content-based filtering particularly to bootstrap
• Go advanced with a hybrid
• Do all of that before getting adventurous with state-of-the-art
You don’t really know unless you evaluate it...
Algorithms
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
• Offline testing
• Online testing (A/B testing)
Evaluating a recommender
Offline testing
• Test offline before deploying
• Parameter sweep is quick
• Doesn’t offend real users
• n-fold cross validation:
• Take the users, items and
relationships between them
(e.g. clicked on, bought)
• Split into n folds, for training
(n-1) and testing (1)
• Attempt to predict the testing
data based on the training
data
• Popularity as baseline
Metrics
• Precision, recall and f-measure
• Receiver operating characteristic
(ROC) curve
• Normalised discounted cumulative
gain (NDCG)
• Mean reciprocal rank (MRR)
• Fraction of Concordant Pairs (FCP)
• ...
Online testing
• Offline performance isn’t a very
precise indicator
• Offline test is good sanity
check
• Online test gives real
performance
• A/B testing
• Deploy your systems that
perform ‘well enough’
• Compare them with each
other in real world
• Mind the pitfalls
Metrics
• The offline metrics +
• Conversion rate
• Open, view, click through rates
• Usage data (e.g. reordered item,
completed reading book)
• Hard to evaluate: trustworthiness;
freshness; diversity; serendipity;
novelty; recency.
• Start with offline testing
• Perform A/B testing but be aware of the common pitfalls
• Hard to evaluate performance in terms of: trustworthiness; freshness;
diversity; serendipity; novelty; recency.
How do we use recommenders?
On to a few of our use cases...
Evaluating a recommender
Overview
• The what and why of recommenders
• Evolution of the recommender problem
• Recommender algorithms
• Evaluating a recommender
• Recommender systems @ Mendeley
Recommenders @ Mendeley
Recommenders @ Mendeley
Related research for an article
Recommenders @ Mendeley
Related research for multiple articles
Recommenders @ Mendeley
Mendeley Suggest - personalised batch of recommended reading
Recommenders @ Mendeley
Researchers to follow on Mendeley
Recommenders @ Mendeley
Interesting activity from your social network
• Recommenders are employed for a number of use cases
• Recommenders deliver different kinds of value depending upon use case
• Can reuse the same underlying recommender system and framework for all
Recommenders @ Mendeley
• Recommenders are complementary to search and becoming mainstream
• although arguably can cater for a wider range of use cases
• When building a recommender, it’s common to predict ratings, rerank,
optimise the page and then introduce context-awareness
• In building a recommender, start with collaborative filtering if you can,
content-based if you need to bootstrap and then explore hybrids
• Open research questions remain as recommenders are used to tackle
trustworthiness; freshness; diversity; serendipity; novelty; recency
Conclusions
References
• Xavier Amatriain, The Recommender Problem Revisited (http://www.
slideshare.net/xamat/recsys-2014-tutorial-the-recommender-problem-
revisited)
• Rec Sys 2014 (http://recsys.acm.org/recsys14/)
Thank you
www.mendeley.com

Mais conteúdo relacionado

Semelhante a Modern Perspectives on Recommender Systems and their Applications in Mendeley

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systemsAravindharamanan S
 
No Drama: Selecting the Right CMS for You
No Drama: Selecting the Right CMS for YouNo Drama: Selecting the Right CMS for You
No Drama: Selecting the Right CMS for YouClearPath, LLC
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightChris Price
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerProduct School
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommendersLudovik Coba
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsChu-Yu Hsu
 
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)Finding the Perfect Donor Database in an Imperfect World (11NTCDB)
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)Miminten
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation systemAkashPatil334
 
Twin Cities Salesforce.com Developer User Group Presentation April 2012
Twin Cities Salesforce.com Developer User Group Presentation April 2012Twin Cities Salesforce.com Developer User Group Presentation April 2012
Twin Cities Salesforce.com Developer User Group Presentation April 2012Developer Force - Force.com Community
 
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar
 

Semelhante a Modern Perspectives on Recommender Systems and their Applications in Mendeley (20)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
 
How Google works
How Google worksHow Google works
How Google works
 
No Drama: Selecting the Right CMS for You
No Drama: Selecting the Right CMS for YouNo Drama: Selecting the Right CMS for You
No Drama: Selecting the Right CMS for You
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
166 sspcc1 b_newman
166 sspcc1 b_newman166 sspcc1 b_newman
166 sspcc1 b_newman
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product Manager
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)Finding the Perfect Donor Database in an Imperfect World (11NTCDB)
Finding the Perfect Donor Database in an Imperfect World (11NTCDB)
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Twin Cities Salesforce.com Developer User Group Presentation April 2012
Twin Cities Salesforce.com Developer User Group Presentation April 2012Twin Cities Salesforce.com Developer User Group Presentation April 2012
Twin Cities Salesforce.com Developer User Group Presentation April 2012
 
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018 Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
Ria Sankar - How to Build Winning Products - Product School Bellevue - 83018
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Modern Perspectives on Recommender Systems and their Applications in Mendeley

  • 1. Kris Jack and Maya Hristakeva 16/12/2014 Modern Perspectives on Recommender Systems and their Applications in Mendeley
  • 2. Kris Jack, Chief Data Scientist http://www.mendeley.com/profiles/kris-jack/ Maya Hristakeva, Senior Data Scientist http://www.mendeley.com/profiles/maya-hristakeva/ Phil Gooch, Senior Data Scientist http://www.mendeley.com/profiles/phil-gooch/
  • 3. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 4. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 5. What is a recommender? A recommendation system (recommender) is a push system that presents users with the most relevant content for their context and needs • helps users to deal with information overload • recommenders are complementary to search search engine pull recommendation engine push request infers context and needs Information Retrieval Information Filtering
  • 6. Recommenders @ Linkedin 50% of LinkedIn connections are from recommendations
  • 8. Recommenders @ Netflix Stop 1% of users from cancelling subscription = $500M/year Netflix invests $150M/year (300 people) in their content rec team
  • 10. Why recommenders? • Search and recommendations are complementary, have arms and legs! • Higher usability, user satisfaction and engagement • Increase product stickiness • Monetise them ...and in the context of research... Help researchers keep up-to-date with latest research, connect with researchers in their field, contextualise their work within the global body of research (articles, researchers, conferences, research groups, etc.)
  • 11. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 12. Evolution of recommender problem Problem: We have a massive collection of items (e.g. > 1 million). We want to recommend 5 items that the user will like.
  • 13. Evolution of recommender problem First, seen as a ratings prediction problem. So, given some knowledge of the user, estimate how much they will appreciate each item on scale of 1-5. 4.9 choose top 5 items with highest predicted ratings 4.7 4.7 4.6 4.5
  • 14. Evolution of recommender problem But do predicted ratings give the best order? Improve the recommender by reranking a selection of items with high predicted ratings. rerank items that are highly predicted 4.7 4.9 4.6 4.6 4.8
  • 15. Evolution of recommender problem Let’s improve the recommendations by optimizing the page in which they appear. deliver them in style
  • 16. Evolution of recommender problem Take the user’s context into account. new to this topic? yesno
  • 17. Evolution of recommender problem Actively researching how to take other properties into account in context: trustworthiness; freshness; diversity; serendipity; novelty; recency. at work? yesno
  • 18. Rating prediction Reranking Page optimisation Context-aware Future: trustworthiness; freshness; diversity; serendipity; novelty; recency. How to make recommendations? On to the algorithms... Evolution of recommender problem time
  • 19. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 20. Recommender algorithms A recommender processes information and transforms it into actionable knowledge. Here we’ll focus on the algorithms that make this possible. information flow (components often built in parallel)
  • 21. Recommender algorithms • Collaborative filtering (similarity and model-based) • Content-based filtering • Hybrid • Non-traditional
  • 22. Collaborative filtering Formal representation • User-based CF finds users who have similar appreciations for items as you and recommends new items based on what they like. • Item-based CF finds items that are similar to the ones you like. Similarity is based on item cooccurrences (e.g. the users who bought x also bought y). Similarity-based CF • ti : rating of user xi for item yi . • Infer prediction function
  • 23. Collaborative filtering Formal representation of MF • X: user-item ratings matrix • U: user-latent factors matrix • S: latent factor diagonal matrix • V: latent factor-item matrix • Matrix Factorisation (SVD++) • Clustering (K-means to LDA) • LSH (Locality sensitive hashing) • Restricted Boltzmann Machines Model-based CF
  • 24. Collaborative filtering Pros • Minimal domain knowledge required • User and item features are not required • Produces good enough results in most cases • Cold start problem • Requires high user:item ratio (1: 10) • Needs standardised products • Popularity bias (doesn’t play well with the long tail) Cons • User-based CF • Item-based CF • Model-based CF
  • 25. Content-based filtering • Determine item similarity based on item content not usage data • Recommend items similar to those that a user is known to like • The user model: • explicitly provided features/keywords of interest • can be a classifier (e.g Naive Bayes, SVM, Decision trees) Formal representation • ti : rating of user xi for item yi , where xi and yi are feature vectors • Infer prediction function
  • 26. Content-based filtering Pros • No cold start problem • No need for usage data • No popularity bias, can recommend items with rare features • Item content needs to be machine readable and meaningful • Easy to pigeonhole the user • Difficult to implement serendipity • Difficult to combine multiple item’ s features together Cons • Determine item similarity based on item content not usage data • Recommend items similar to those that a user is known to like • The user model: • explicitly provided features/keywords of interest • can be a classifier (e.g Naive Bayes, SVM, Decision trees)
  • 27. Hybrid approaches Method Description Weighted Outputs from several techniques (in the form of scores or votes) are combined with different degrees of importance to offer final recommendations Switching Depending on situation, the system changes from one technique to another Mixed Recommendations from several techniques are presented at the same time Feature combination Features from different recommendation sources are combined as input to a single technique Cascade The output from one technique is used as input of another that refines the result Feature augmentation The output from one technique is used as input features to another Meta-level The model learned by one recommender is used as input to another
  • 28. Hybrid approaches Combining user and item features and usage to benefit from both Pros • Often outperforms CF and CB alone Cons • Can be a lot of work to get the right balance
  • 29. Non-traditional approaches • Deep learning • Social recommendations • Learning to rank • ... Pros Cons • Good for eking out those final performance percentage points • You can say you’re working with current edge approaches ;) • Less well understood • Less supported in recommendation toolkits • Not recommended approaches for your first recommender
  • 30. Is your recommender doing well? • Typically employ collaborative filtering • May need to use content-based filtering particularly to bootstrap • Go advanced with a hybrid • Do all of that before getting adventurous with state-of-the-art You don’t really know unless you evaluate it... Algorithms
  • 31. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 32. • Offline testing • Online testing (A/B testing) Evaluating a recommender
  • 33. Offline testing • Test offline before deploying • Parameter sweep is quick • Doesn’t offend real users • n-fold cross validation: • Take the users, items and relationships between them (e.g. clicked on, bought) • Split into n folds, for training (n-1) and testing (1) • Attempt to predict the testing data based on the training data • Popularity as baseline Metrics • Precision, recall and f-measure • Receiver operating characteristic (ROC) curve • Normalised discounted cumulative gain (NDCG) • Mean reciprocal rank (MRR) • Fraction of Concordant Pairs (FCP) • ...
  • 34. Online testing • Offline performance isn’t a very precise indicator • Offline test is good sanity check • Online test gives real performance • A/B testing • Deploy your systems that perform ‘well enough’ • Compare them with each other in real world • Mind the pitfalls Metrics • The offline metrics + • Conversion rate • Open, view, click through rates • Usage data (e.g. reordered item, completed reading book) • Hard to evaluate: trustworthiness; freshness; diversity; serendipity; novelty; recency.
  • 35. • Start with offline testing • Perform A/B testing but be aware of the common pitfalls • Hard to evaluate performance in terms of: trustworthiness; freshness; diversity; serendipity; novelty; recency. How do we use recommenders? On to a few of our use cases... Evaluating a recommender
  • 36. Overview • The what and why of recommenders • Evolution of the recommender problem • Recommender algorithms • Evaluating a recommender • Recommender systems @ Mendeley
  • 38. Recommenders @ Mendeley Related research for an article
  • 39. Recommenders @ Mendeley Related research for multiple articles
  • 40. Recommenders @ Mendeley Mendeley Suggest - personalised batch of recommended reading
  • 41. Recommenders @ Mendeley Researchers to follow on Mendeley
  • 42. Recommenders @ Mendeley Interesting activity from your social network
  • 43. • Recommenders are employed for a number of use cases • Recommenders deliver different kinds of value depending upon use case • Can reuse the same underlying recommender system and framework for all Recommenders @ Mendeley
  • 44. • Recommenders are complementary to search and becoming mainstream • although arguably can cater for a wider range of use cases • When building a recommender, it’s common to predict ratings, rerank, optimise the page and then introduce context-awareness • In building a recommender, start with collaborative filtering if you can, content-based if you need to bootstrap and then explore hybrids • Open research questions remain as recommenders are used to tackle trustworthiness; freshness; diversity; serendipity; novelty; recency Conclusions
  • 45. References • Xavier Amatriain, The Recommender Problem Revisited (http://www. slideshare.net/xamat/recsys-2014-tutorial-the-recommender-problem- revisited) • Rec Sys 2014 (http://recsys.acm.org/recsys14/)