SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
1
Characterization of Fair Experiments
for Recommender System Evaluation – A Formal Analysis
Pablo Castells and Rocío Cañamares
Universidad Autónoma de Madrid
{pablo.castells,rocio.cannamares}@uam.es
Workshop on offline evaluation for recommender systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys 2018)
IRGIRGroup @UAM
Empirical fairness test
Evaluation fairness condition
𝑃 =
𝑅 ∩ ℛ
𝑅
= 𝑝 ℛ 𝑅
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅 ∩ ℛ
ℛ
= 𝑝 𝑅 ℛ
𝑅
𝒥𝑡𝑒𝑠𝑡
𝒰 × ℐ
𝒥𝑡𝑟𝑎𝑖𝑛
𝒯
𝑃 =
𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅
= 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅 ∩ 𝒥𝑡𝑒𝑠𝑡
=
𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹
𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡
𝑅𝑒𝑐𝑎𝑙𝑙
Metric definition Metric estimatesElements of an experiment
All user-item pairs
Training sample
Target
pairs
𝑅 (recommendation)
𝑅 ∩ ℛ 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅
Fair estimates
Preservation of system comparisons: 𝑃 𝑅1 ≤ 𝑃 𝑅2 ⟺ 𝑃 𝑅1 ≤ 𝑃 𝑅2 – we say 𝑃 ∝ 𝑃
Metric estimate preserves system comparison ⟺ 𝑝 𝒥𝑡𝑒𝑠𝑡 ℛ, 𝑅 is the same for all systems
𝑷 ∝ 𝑷 ⟺ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 ∼ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝒯 ∀𝑹 ⊂ 𝒯 (and same for 𝑅𝑒𝑐𝑎𝑙𝑙 ∝ 𝑅𝑒𝑐𝑎𝑙𝑙)
Fair experiment ⟺ test judgments are identically and independently distributed over relevant targets
𝑅 ⊊ 𝒯
𝒥𝑡𝑒𝑠𝑡 ⊆ 𝒯 ⊆ 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛
𝒥𝑡𝑟𝑎𝑖𝑛 ∩ 𝒥𝑡𝑒𝑠𝑡 = ∅
• •
••
• Take some sample 𝒥 = 𝒥𝑡𝑟𝑎𝑖𝑛 ∪ 𝒥𝑡𝑒𝑠𝑡 of user preferences (ratings, judgments, observed interaction)
• Null hypothesis: Let user preferences ℛ be random, i.e. uniformly and independently distributed over items (the sample 𝒥 may not)
• Run an experiment over 𝒥, ℛ for a set of recommendation algorithms
• Some system is better than random recommendation  Then your experiment is unfair (the data sampling/subsampling, the metric, etc.)
Analysis of common experimental protocols
Random rating split Flat test [1] Popularity strata [1]
2. Randomized (forced)
test judgments [2,3]
1. Free user feedback
Null hypothesis recreation: taking e.g. MovieLens 1M, keep ratings (judgment set 𝒥) but shuffle rating values (ℛ set) over ratings
 User preferences become random, uniform and independent between users
 Judgment distribution retains popularity biases and inter-user (and inter-item) dependencies
Items
Judgments
Items
Judgments
Items
Judgments
Original data (MovieLens 1M) Null hypothesis (randomized MovieLens 1M preferences )
Items
Judgments
𝒥𝑡𝑒𝑠𝑡
𝒥𝑡𝑟𝑎𝑖𝑛
FairNot fair
Conclusions
• We also examine experimental protocols analytically
 Empirical fairness test is consistent with analytical
fairness condition
• Temporal split can be usually expected to be still biased
• Interleaved AB tests should be fair
Not fully fair Not fully fair FairHow fair?
• Only randomized test judgments or 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 ensure fairness
 But 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 is not as realistic as 𝒯 ← 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 (plus coverage shortfalls)
 Forced judgments to be handled with some care to be fully fair (see in paper)
• Other protocols are biased to non-random patterns in observations
 Popularity, inter-user dependences, etc. (avg rating would not seem affected though)
𝒯 ← 𝒥𝑡𝑒𝑠𝑡
Random
Popularity
Positive popularity
Average rating
User-based kNN
Matrix factorization
0
0.05
0.1
0.15
0
0.2
0.4
𝒯 ← 𝒥𝑡𝑒𝑠𝑡
0
0.2
0.4
0.6
0
0.1
0.2
0.3
P@10
0
0.05
0.1
0.15
0
0.025
0.05
0.075
0
0.02
0.04
Simulated random
test sample of
random preferences
1. A. Bellogín, P. Castells and I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, pp. 606-634.
2. R. Cañamares and P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018, Ann Arbor, MI, USA, July 2018, pp. 415-424.
3. B. Marlin and R. Zemel. Collaborative prediction and ranking with nonrandom missing data. RecSys 2009, New York, NY, USA, October 2009, pp. 5-12.
Test
sample
ℛ
Relevant
pairs

Mais conteúdo relacionado

Mais procurados

Mechanical system design
Mechanical system designMechanical system design
Mechanical system designSushil Kuwar
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysissunilgv06
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Setiawan
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
Evaluation in Africa RISING
Evaluation in Africa RISINGEvaluation in Africa RISING
Evaluation in Africa RISINGafrica-rising
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesDinul
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Itembarthriley
 
Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysisUMAR ALI
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsNitin George
 
Analyzing Responses to Likert Items
Analyzing Responses to Likert ItemsAnalyzing Responses to Likert Items
Analyzing Responses to Likert ItemsSanjay Kairam
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Softwarejatwood3
 

Mais procurados (20)

Item analysis
Item analysisItem analysis
Item analysis
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Mechanical system design
Mechanical system designMechanical system design
Mechanical system design
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
poster_Reza
poster_Rezaposter_Reza
poster_Reza
 
Poster
PosterPoster
Poster
 
Matching methods
Matching methodsMatching methods
Matching methods
 
Doe01 intro
Doe01 introDoe01 intro
Doe01 intro
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
Evaluation in Africa RISING
Evaluation in Africa RISINGEvaluation in Africa RISING
Evaluation in Africa RISING
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysis
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trials
 
Analyzing Responses to Likert Items
Analyzing Responses to Likert ItemsAnalyzing Responses to Likert Items
Analyzing Responses to Likert Items
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
 

Semelhante a REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis

Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyVito Walter Anelli
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPAlAcademia Tsr
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
آلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةآلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةUniv. of Tripoli
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Experimental Design.pptx
Experimental Design.pptxExperimental Design.pptx
Experimental Design.pptxOnlineWorld4
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perrosJesmin Rahaman
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attributejdavidgreen007
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityFaysal Ahmed
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisGabor Szabo, CQE
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsMegaVjohnson
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial researchpbbharate
 

Semelhante a REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis (20)

Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
آلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةآلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذية
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Experimental Design.pptx
Experimental Design.pptxExperimental Design.pptx
Experimental Design.pptx
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perros
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attribute
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and quality
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems Analysis
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender Systems
 
Dissertation Aaron Tesch
Dissertation Aaron TeschDissertation Aaron Tesch
Dissertation Aaron Tesch
 
Software Testing
Software Testing Software Testing
Software Testing
 
Diss Pres
Diss PresDiss Pres
Diss Pres
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 

Mais de Pablo Castells

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendationPablo Castells
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Pablo Castells
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationPablo Castells
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...Pablo Castells
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...Pablo Castells
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...Pablo Castells
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsPablo Castells
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...Pablo Castells
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...Pablo Castells
 

Mais de Pablo Castells (9)

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendation
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
 

Último

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Último (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis

  • 1. 1 Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis Pablo Castells and Rocío Cañamares Universidad Autónoma de Madrid {pablo.castells,rocio.cannamares}@uam.es Workshop on offline evaluation for recommender systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys 2018) IRGIRGroup @UAM Empirical fairness test Evaluation fairness condition 𝑃 = 𝑅 ∩ ℛ 𝑅 = 𝑝 ℛ 𝑅 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅 ∩ ℛ ℛ = 𝑝 𝑅 ℛ 𝑅 𝒥𝑡𝑒𝑠𝑡 𝒰 × ℐ 𝒥𝑡𝑟𝑎𝑖𝑛 𝒯 𝑃 = 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 = 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 ∩ 𝒥𝑡𝑒𝑠𝑡 = 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡 𝑅𝑒𝑐𝑎𝑙𝑙 Metric definition Metric estimatesElements of an experiment All user-item pairs Training sample Target pairs 𝑅 (recommendation) 𝑅 ∩ ℛ 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 Fair estimates Preservation of system comparisons: 𝑃 𝑅1 ≤ 𝑃 𝑅2 ⟺ 𝑃 𝑅1 ≤ 𝑃 𝑅2 – we say 𝑃 ∝ 𝑃 Metric estimate preserves system comparison ⟺ 𝑝 𝒥𝑡𝑒𝑠𝑡 ℛ, 𝑅 is the same for all systems 𝑷 ∝ 𝑷 ⟺ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 ∼ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝒯 ∀𝑹 ⊂ 𝒯 (and same for 𝑅𝑒𝑐𝑎𝑙𝑙 ∝ 𝑅𝑒𝑐𝑎𝑙𝑙) Fair experiment ⟺ test judgments are identically and independently distributed over relevant targets 𝑅 ⊊ 𝒯 𝒥𝑡𝑒𝑠𝑡 ⊆ 𝒯 ⊆ 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 𝒥𝑡𝑟𝑎𝑖𝑛 ∩ 𝒥𝑡𝑒𝑠𝑡 = ∅ • • •• • Take some sample 𝒥 = 𝒥𝑡𝑟𝑎𝑖𝑛 ∪ 𝒥𝑡𝑒𝑠𝑡 of user preferences (ratings, judgments, observed interaction) • Null hypothesis: Let user preferences ℛ be random, i.e. uniformly and independently distributed over items (the sample 𝒥 may not) • Run an experiment over 𝒥, ℛ for a set of recommendation algorithms • Some system is better than random recommendation  Then your experiment is unfair (the data sampling/subsampling, the metric, etc.) Analysis of common experimental protocols Random rating split Flat test [1] Popularity strata [1] 2. Randomized (forced) test judgments [2,3] 1. Free user feedback Null hypothesis recreation: taking e.g. MovieLens 1M, keep ratings (judgment set 𝒥) but shuffle rating values (ℛ set) over ratings  User preferences become random, uniform and independent between users  Judgment distribution retains popularity biases and inter-user (and inter-item) dependencies Items Judgments Items Judgments Items Judgments Original data (MovieLens 1M) Null hypothesis (randomized MovieLens 1M preferences ) Items Judgments 𝒥𝑡𝑒𝑠𝑡 𝒥𝑡𝑟𝑎𝑖𝑛 FairNot fair Conclusions • We also examine experimental protocols analytically  Empirical fairness test is consistent with analytical fairness condition • Temporal split can be usually expected to be still biased • Interleaved AB tests should be fair Not fully fair Not fully fair FairHow fair? • Only randomized test judgments or 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 ensure fairness  But 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 is not as realistic as 𝒯 ← 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 (plus coverage shortfalls)  Forced judgments to be handled with some care to be fully fair (see in paper) • Other protocols are biased to non-random patterns in observations  Popularity, inter-user dependences, etc. (avg rating would not seem affected though) 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 Random Popularity Positive popularity Average rating User-based kNN Matrix factorization 0 0.05 0.1 0.15 0 0.2 0.4 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 0 0.2 0.4 0.6 0 0.1 0.2 0.3 P@10 0 0.05 0.1 0.15 0 0.025 0.05 0.075 0 0.02 0.04 Simulated random test sample of random preferences 1. A. Bellogín, P. Castells and I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, pp. 606-634. 2. R. Cañamares and P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018, Ann Arbor, MI, USA, July 2018, pp. 415-424. 3. B. Marlin and R. Zemel. Collaborative prediction and ranking with nonrandom missing data. RecSys 2009, New York, NY, USA, October 2009, pp. 5-12. Test sample ℛ Relevant pairs