SlideShare uma empresa Scribd logo
1 de 26
Recommenders, Topics, and Text
Susan Athey and Guido Imbens, Stanford University
NBER Methods Lectures, 2015
Introduction
Recommendation Systems
 Predict whether a user will like an item (or how much: here binary choice for simplicity)
 Scenarios: what is observed?
 Lots of users and lots of items
 Observable item characteristics
 Matrix of users x choices with 1’s (or ratings) for items consumer likes/chose in past
 Economics Literature
 Obvious connection to discrete choice modeling
 Multiple choices per user
 Small literature incorporating multiple choices: voting records, Neilsen TV viewing, school
choice, small supermarket bundles
 ML Literature
 Massive literature with lots of disconnected subliteratures: too many to cite
 Netflix challenge results (and almost all contests): average of hundreds of models
 Big strength: incorporating multiple sparse choice data at scale
 Focus on prediction, not interpretability or counterfactual prediction, but some approaches
deliver interpretability and parameters that look like primitives
 For economic applications:
 ML methods have a lot of promise for large scale choice modeling, but more work to tailor them
to economist’s needs/tastes. One strand of literature builds models with latent variables that
connect closely to structural economic models.
Strands of Recommendation Literature
“Unsupervised”: Item
Similarity
 Suppose observe item
characteristics
 Recommend similar items
to what the user likes
 Use unsupervised
learning techniques to
reduce dimensionality
(see earlier lectures)
“Supervised”: Collaborative
Filtering
 Suppose observe choice
data but limited item
characteristics
 Find other users with
similar tastes
 Recommend items liked
by similar users
 E.g. Matrix decomposition
Hierarchical/Graphical
Models
Bringing in User Choice Data
 It can be useful to model item cluster/topics
SEPARATELY from user data if:
 You want to make predictions when the context changes
 Website may change what types of articles it displays
together; journal may do special issues or article
groupings
 It can be useful to model topics together with user
data if:
 You want to make the best possible recommendations in
a stable context
Collaborative Filtering
0 0 0 0 1 0 0 1
1 0 0 1 1 1 0 0
0 1 0 0 1 1 0 1
0 0 0 1 1 1 0 ?
0 0 1 1 1 0 0 ?
0 1 0 0 1 0 1 ?
0 1 0 0 0 1 0 ?
Items
User
s
Goal: predict probabilities that users choose item
Note: in Netflix challenge data, have ratings 1-5 but many items not rated at
all.
Collaborative Filtering:
Classification problem (e.g. logit, CART)
 Simple, familiar approach:
 Use a logit or other classifier item by item
 Model: Pr(uij=1) = f(indicators for other items user
likes)
 Approach doesn’t work well with lots of items and
few items chosen per user
 Need to combine with other methods to reduce
dimensionality of feature space
 Item by item prediction is costly
K-nearest neighbor (KNN) Prediction
 To predict Pr(uij=1):
 Compute similarity between item j and all other items
 Find the k closest by similarity metric
 Calculate weighted average of user choices of the neighbor items
 Similarity:
 Measured in terms of user preference vectors
 Cosine similarity: the angle between the vectors
 Very popular method! Easy and fast.
 Cons
 With sparse matrix, may predict a lot of zeros
 May not work very well with simple weights
 More complex versions “learn”=estimate weights, but if you want to
estimate a model, other models may work better
Collaborative Filtering:
Matrix Factorization

x
I x J choice matrix I x K matrix of
user factors
K x J matrix
item factors
Analyst picks K. Use various methods (that work at large scale) to find factor
matrices that minimize mean squared error of predicts versus true choice
matrix.
PROS: Easy and fast. Transparent. Results can be inspected and maybe
interpreted.
CONS: Harder to link to a model and extend in various directions. No
Other Approaches
 Iterative clustering
 Can use any of a variety of unsupervised learning methods to
find groups of similar users or items
 Iterate:
 Group items with similar user (category) vectors
 Then group users with similar item (category) vectors
 Structural Models/Hierarchical Models
 Specify latent factors and distribution they are drawn from
 Use Bayesian methods or EM algorithm to estimate the latent
factors
 More principled way to do decomposition and incorporate user
and item characteristics
 Will return to this after brief digression…
Recommendations and Topic
Modeling for Documents and Text
Collaborative Filtering versus Topic Models
0 0 0 0 1 0 0 1
1 0 0 1 1 1 0 0
0 1 0 0 1 1 0 1
0 0 0 1 1 1 0 ?
0 0 1 1 1 0 0 ?
0 1 0 0 1 0 1 ?
0 1 0 0 0 1 0 ?
Items
Words/Phrases
Users
Document
s
Goal: predict probabilities that users choose item by finding latent
factors
Goal: put documents into topics with similar collections of words
Finding topics of articles: Methods for clustering described above can be used as
quick and dirty ways to put documents into clusters. Hierarchical models are more
sophisticated.
Classifying articles (good/bad, left/right): Supervised learning, and particular
classification methods, can be applied if some documents are labelled, e.g using
human judges. Can combine unsupervised learning techniques for dimensionality
Classifying Large Number of Articles
 Take articles, have humans label
as many as you can afford
 Build a classifier to classify
unlabeled articles
 See, e.g., Vowpal Wabbit
 Use unsupervised learning for
feature reduction
 And/or, use method such as
LASSO or other regularized ML
methods with cross-validation to
select model complexity,
specification of features
 Warning: this doesn’t work as well
as you might think w/ limited
training data and many features.
Needs tweaking in feature
reduction.
 Expect to see this used widely in
economics for text.
 A number of applications already
(e.g. political bias, review
sentiment, etc. See
 Possible
alternatives/improvements
 Athey, Mobius, and Pal (in
progress)
 Use unsupervised learning
PRIOR to human labeling
 Put articles into topics, and
then crowd source features of
topics, also of articles within
each topic (stratified)
 For article-specific
characteristics that don’t apply
to topic as a whole, train
classifier (e.g. for left/right
political bias)
 Classifier works much better
within a topic (since
significance of words different
by topic)
 Can avoid asking about
political bias of article about a
sports event
Quick and Dirty Clustering for Topics:
Network Methods
 Another approach to clustering is to interpret the
document – word matrix as a network
 There are network methods for clustering or
“community detection,” some of which are different
from matrix factorization
 Approach based on “modularity” (see e.g. Newman) –
greedy, scalable algorithm to maximize links within
communities while penalizing links outside
 Athey, Mobius, and Pal (in progress) apply this to
news articles
 Strength of link between articles is determined by
common words, as well as links to common Wikipedia
articles
Network Classification Example
Topic Examples in April 2013
Boston Marathon bombings 2013 Moore tornado
2013 Korean crisis Death of Lee Rigby
2012–13 in English football 2012 Benghazi attack
Ariel Castro kidnappings Murder of Travis Alexander
2013 NFL season
List of school shootings in the United
States
2013 in baseball Death and funeral of Margaret Thatcher
2013 in American television
Timeline of the Syrian civil war (May–
August 2012)
2013 NBA Finals 2013 IRS scandal
2013 in film
NCAA Men's Division I Basketball
Championship
2012–13 NHL season Malaysian general election, 2013
2013 Masters Tournament Shooting of Trayvon Martin
2013 NASCAR Sprint Cup Series Phil Mickelson
Hierarchical Models
(Closer to Structural Models)
Many of these slides based on a tutorial by David Blei;
See David Blei’s web page at Columbia for more details
http://www.cs.columbia.edu/~blei/
Latent Dirichlet Allocation (LDA)
LDA As a Graphical Model
Estimation and Inference of LDA
Topics Estimated from Articles in Science
Making Recommendations with User Choice
Data and Content Data
User and Content Data
Complex Model with Latent Variables for
Topics and User Preferences
Grey
circles:
observable
s
Conclusions
 Recommendation systems literature is close to demand literature in
goals
 ML Literature is massive, disconnected, ad hoc, and hard to absorb
 Quick and dirty techniques can be a substitute for complex models
for both recommendation systems and for topic modeling
 Clustering methods
 Matrix decomposition techniques give a sense of what would come out of
a latent variable model with much less work
 Network literature also has some fast and easy methods
 One branch of ML (graphical models) is very close to structural
models with unobserved product characteristics
 Topic models help understand structure underlying documents, and can
be used as part of user choice models
 Open questions for economists include identification and mapping onto
utility framework
 Applications/directions
 My work with Jeno Pal and Markus Mobius looks empirically at role of
aggregators and intermediaries on what articles people choose
 With David Blei, Jake Hofman, and Markus Mobius, looks
methodologically at how to link topic models to structural IO models and
incorporate role of availability and prominence of what is shown

Mais conteúdo relacionado

Mais procurados

Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
renataghisloti
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendations
Domonkos Tikk
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Data
feiwin
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
Salil Navgire
 

Mais procurados (20)

Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag Clouds
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Interactive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector EmbeddingsInteractive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector Embeddings
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Eric Smidth
Eric SmidthEric Smidth
Eric Smidth
 
Recommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software Development
 
General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendations
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Data
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Unsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsUnsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media Texts
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 

Semelhante a Recommenders, Topics, and Text

Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
butest
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
Eindhoven University of Technology / JADS
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
Khadija Atiya
 
Book Recommendations.pptx
Book Recommendations.pptxBook Recommendations.pptx
Book Recommendations.pptx
DishaSharma337110
 

Semelhante a Recommenders, Topics, and Text (20)

Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Multilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modelingMultilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modeling
 
Multidirectional Product Support System for Decision Making In Textile Indust...
Multidirectional Product Support System for Decision Making In Textile Indust...Multidirectional Product Support System for Decision Making In Textile Indust...
Multidirectional Product Support System for Decision Making In Textile Indust...
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
PhD defense
PhD defense PhD defense
PhD defense
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
On the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisonsOn the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisons
 
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
[MMIR@MM2023] On Popularity Bias of Multimodal-aware Recommender Systems: A M...
 
lms final ppt.pptx
lms final ppt.pptxlms final ppt.pptx
lms final ppt.pptx
 
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative Filtering
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
Book Recommendations.pptx
Book Recommendations.pptxBook Recommendations.pptx
Book Recommendations.pptx
 

Mais de NBER

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
NBER
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They Pay
NBER
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
NBER
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax Credits
NBER
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
NBER
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3
NBER
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
NBER
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4
NBER
 

Mais de NBER (20)

FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATESFISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
FISCAL STIMULUS IN ECONOMIC UNIONS: WHAT ROLE FOR STATES
 
Business in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They PayBusiness in the United States Who Owns it and How Much Tax They Pay
Business in the United States Who Owns it and How Much Tax They Pay
 
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
Redistribution through Minimum Wage Regulation: An Analysis of Program Linkag...
 
The Distributional E ffects of U.S. Clean Energy Tax Credits
The Distributional Effects of U.S. Clean Energy Tax CreditsThe Distributional Effects of U.S. Clean Energy Tax Credits
The Distributional E ffects of U.S. Clean Energy Tax Credits
 
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
An Experimental Evaluation of Strategies to Increase Property Tax Compliance:...
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3Jackson nber-slides2014 lecture3
Jackson nber-slides2014 lecture3
 
Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1Jackson nber-slides2014 lecture1
Jackson nber-slides2014 lecture1
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
Acemoglu lecture4
Acemoglu lecture4Acemoglu lecture4
Acemoglu lecture4
 
The NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua GansThe NBER Working Paper Series at 20,000 - Joshua Gans
The NBER Working Paper Series at 20,000 - Joshua Gans
 
The NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia GoldinThe NBER Working Paper Series at 20,000 - Claudia Goldin
The NBER Working Paper Series at 20,000 - Claudia Goldin
 
The NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James PoterbaThe NBER Working Paper Series at 20,000 - James Poterba
The NBER Working Paper Series at 20,000 - James Poterba
 
The NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott SternThe NBER Working Paper Series at 20,000 - Scott Stern
The NBER Working Paper Series at 20,000 - Scott Stern
 
The NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn EllisonThe NBER Working Paper Series at 20,000 - Glenn Ellison
The NBER Working Paper Series at 20,000 - Glenn Ellison
 
L3 1b
L3 1bL3 1b
L3 1b
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse Models
 

Último

2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
MadhuKothuru
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
ScottMeyers35
 
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
HyderabadDolls
 

Último (20)

Private Call Girls Bidar 9332606886Call Girls Advance Cash On Delivery Service
Private Call Girls Bidar  9332606886Call Girls Advance Cash On Delivery ServicePrivate Call Girls Bidar  9332606886Call Girls Advance Cash On Delivery Service
Private Call Girls Bidar 9332606886Call Girls Advance Cash On Delivery Service
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx
 
NAP Expo - Delivering effective and adequate adaptation.pptx
NAP Expo - Delivering effective and adequate adaptation.pptxNAP Expo - Delivering effective and adequate adaptation.pptx
NAP Expo - Delivering effective and adequate adaptation.pptx
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
 
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
 
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girlsPakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
 
2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.2024 UN Civil Society Conference in Support of the Summit of the Future.
2024 UN Civil Society Conference in Support of the Summit of the Future.
 
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32
 
Call Girl Service in Korba 9332606886 High Profile Call Girls You Can Get ...
Call Girl Service in Korba   9332606886  High Profile Call Girls You Can Get ...Call Girl Service in Korba   9332606886  High Profile Call Girls You Can Get ...
Call Girl Service in Korba 9332606886 High Profile Call Girls You Can Get ...
 
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
Nagerbazar @ Independent Call Girls Kolkata - 450+ Call Girl Cash Payment 800...
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s education
 
Call Girls Koregaon Park - 8250092165 Our call girls are sure to provide you ...
Call Girls Koregaon Park - 8250092165 Our call girls are sure to provide you ...Call Girls Koregaon Park - 8250092165 Our call girls are sure to provide you ...
Call Girls Koregaon Park - 8250092165 Our call girls are sure to provide you ...
 

Recommenders, Topics, and Text

  • 1. Recommenders, Topics, and Text Susan Athey and Guido Imbens, Stanford University NBER Methods Lectures, 2015
  • 3.
  • 4. Recommendation Systems  Predict whether a user will like an item (or how much: here binary choice for simplicity)  Scenarios: what is observed?  Lots of users and lots of items  Observable item characteristics  Matrix of users x choices with 1’s (or ratings) for items consumer likes/chose in past  Economics Literature  Obvious connection to discrete choice modeling  Multiple choices per user  Small literature incorporating multiple choices: voting records, Neilsen TV viewing, school choice, small supermarket bundles  ML Literature  Massive literature with lots of disconnected subliteratures: too many to cite  Netflix challenge results (and almost all contests): average of hundreds of models  Big strength: incorporating multiple sparse choice data at scale  Focus on prediction, not interpretability or counterfactual prediction, but some approaches deliver interpretability and parameters that look like primitives  For economic applications:  ML methods have a lot of promise for large scale choice modeling, but more work to tailor them to economist’s needs/tastes. One strand of literature builds models with latent variables that connect closely to structural economic models.
  • 5. Strands of Recommendation Literature “Unsupervised”: Item Similarity  Suppose observe item characteristics  Recommend similar items to what the user likes  Use unsupervised learning techniques to reduce dimensionality (see earlier lectures) “Supervised”: Collaborative Filtering  Suppose observe choice data but limited item characteristics  Find other users with similar tastes  Recommend items liked by similar users  E.g. Matrix decomposition Hierarchical/Graphical Models
  • 6. Bringing in User Choice Data  It can be useful to model item cluster/topics SEPARATELY from user data if:  You want to make predictions when the context changes  Website may change what types of articles it displays together; journal may do special issues or article groupings  It can be useful to model topics together with user data if:  You want to make the best possible recommendations in a stable context
  • 7. Collaborative Filtering 0 0 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 1 0 ? 0 0 1 1 1 0 0 ? 0 1 0 0 1 0 1 ? 0 1 0 0 0 1 0 ? Items User s Goal: predict probabilities that users choose item Note: in Netflix challenge data, have ratings 1-5 but many items not rated at all.
  • 8. Collaborative Filtering: Classification problem (e.g. logit, CART)  Simple, familiar approach:  Use a logit or other classifier item by item  Model: Pr(uij=1) = f(indicators for other items user likes)  Approach doesn’t work well with lots of items and few items chosen per user  Need to combine with other methods to reduce dimensionality of feature space  Item by item prediction is costly
  • 9. K-nearest neighbor (KNN) Prediction  To predict Pr(uij=1):  Compute similarity between item j and all other items  Find the k closest by similarity metric  Calculate weighted average of user choices of the neighbor items  Similarity:  Measured in terms of user preference vectors  Cosine similarity: the angle between the vectors  Very popular method! Easy and fast.  Cons  With sparse matrix, may predict a lot of zeros  May not work very well with simple weights  More complex versions “learn”=estimate weights, but if you want to estimate a model, other models may work better
  • 10. Collaborative Filtering: Matrix Factorization  x I x J choice matrix I x K matrix of user factors K x J matrix item factors Analyst picks K. Use various methods (that work at large scale) to find factor matrices that minimize mean squared error of predicts versus true choice matrix. PROS: Easy and fast. Transparent. Results can be inspected and maybe interpreted. CONS: Harder to link to a model and extend in various directions. No
  • 11. Other Approaches  Iterative clustering  Can use any of a variety of unsupervised learning methods to find groups of similar users or items  Iterate:  Group items with similar user (category) vectors  Then group users with similar item (category) vectors  Structural Models/Hierarchical Models  Specify latent factors and distribution they are drawn from  Use Bayesian methods or EM algorithm to estimate the latent factors  More principled way to do decomposition and incorporate user and item characteristics  Will return to this after brief digression…
  • 12. Recommendations and Topic Modeling for Documents and Text
  • 13. Collaborative Filtering versus Topic Models 0 0 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 1 0 ? 0 0 1 1 1 0 0 ? 0 1 0 0 1 0 1 ? 0 1 0 0 0 1 0 ? Items Words/Phrases Users Document s Goal: predict probabilities that users choose item by finding latent factors Goal: put documents into topics with similar collections of words Finding topics of articles: Methods for clustering described above can be used as quick and dirty ways to put documents into clusters. Hierarchical models are more sophisticated. Classifying articles (good/bad, left/right): Supervised learning, and particular classification methods, can be applied if some documents are labelled, e.g using human judges. Can combine unsupervised learning techniques for dimensionality
  • 14. Classifying Large Number of Articles  Take articles, have humans label as many as you can afford  Build a classifier to classify unlabeled articles  See, e.g., Vowpal Wabbit  Use unsupervised learning for feature reduction  And/or, use method such as LASSO or other regularized ML methods with cross-validation to select model complexity, specification of features  Warning: this doesn’t work as well as you might think w/ limited training data and many features. Needs tweaking in feature reduction.  Expect to see this used widely in economics for text.  A number of applications already (e.g. political bias, review sentiment, etc. See  Possible alternatives/improvements  Athey, Mobius, and Pal (in progress)  Use unsupervised learning PRIOR to human labeling  Put articles into topics, and then crowd source features of topics, also of articles within each topic (stratified)  For article-specific characteristics that don’t apply to topic as a whole, train classifier (e.g. for left/right political bias)  Classifier works much better within a topic (since significance of words different by topic)  Can avoid asking about political bias of article about a sports event
  • 15. Quick and Dirty Clustering for Topics: Network Methods  Another approach to clustering is to interpret the document – word matrix as a network  There are network methods for clustering or “community detection,” some of which are different from matrix factorization  Approach based on “modularity” (see e.g. Newman) – greedy, scalable algorithm to maximize links within communities while penalizing links outside  Athey, Mobius, and Pal (in progress) apply this to news articles  Strength of link between articles is determined by common words, as well as links to common Wikipedia articles
  • 17. Topic Examples in April 2013 Boston Marathon bombings 2013 Moore tornado 2013 Korean crisis Death of Lee Rigby 2012–13 in English football 2012 Benghazi attack Ariel Castro kidnappings Murder of Travis Alexander 2013 NFL season List of school shootings in the United States 2013 in baseball Death and funeral of Margaret Thatcher 2013 in American television Timeline of the Syrian civil war (May– August 2012) 2013 NBA Finals 2013 IRS scandal 2013 in film NCAA Men's Division I Basketball Championship 2012–13 NHL season Malaysian general election, 2013 2013 Masters Tournament Shooting of Trayvon Martin 2013 NASCAR Sprint Cup Series Phil Mickelson
  • 18. Hierarchical Models (Closer to Structural Models) Many of these slides based on a tutorial by David Blei; See David Blei’s web page at Columbia for more details http://www.cs.columbia.edu/~blei/
  • 20. LDA As a Graphical Model
  • 22. Topics Estimated from Articles in Science
  • 23. Making Recommendations with User Choice Data and Content Data
  • 25. Complex Model with Latent Variables for Topics and User Preferences Grey circles: observable s
  • 26. Conclusions  Recommendation systems literature is close to demand literature in goals  ML Literature is massive, disconnected, ad hoc, and hard to absorb  Quick and dirty techniques can be a substitute for complex models for both recommendation systems and for topic modeling  Clustering methods  Matrix decomposition techniques give a sense of what would come out of a latent variable model with much less work  Network literature also has some fast and easy methods  One branch of ML (graphical models) is very close to structural models with unobserved product characteristics  Topic models help understand structure underlying documents, and can be used as part of user choice models  Open questions for economists include identification and mapping onto utility framework  Applications/directions  My work with Jeno Pal and Markus Mobius looks empirically at role of aggregators and intermediaries on what articles people choose  With David Blei, Jake Hofman, and Markus Mobius, looks methodologically at how to link topic models to structural IO models and incorporate role of availability and prominence of what is shown