GeneralizibilityFairness - DEFirst Reading Group

1
WI Group, University College London, hossein.rahmani.22@ucl.ac.uk
2
DECIDE, University of Southern California, naghiaei@usc.edu
3
Abin's Lab, Shahid Beheshti University, mahdi.dehghan551@gmail.com
4
IRLab, University of Amsterdam, m.aliannejadi@uva.nl
Experiments on Generalizability of User-oriented Fairness
in Recommender Systems
UCL AI Center
Hossein A. (Saeed) Rahmani1
, Mohammadmehdi Naghiaei2
, Mehdi Dehghan3
, Mohammad Aliannejadi4
Shahid Beheshti
University
DEFirst Reading Group
Mila x Vector Institute
23 March, 2023

2
DEFirst '23 | Mila x Vector Institute | Generalizability of User-oriented Fairness in Recommender Systems
Presentation at SIGIR 2022

collection
(rate, click, visit, …)
learning
Recommender Engine
Data
User
recommendation
(top-N, rating, …)
❏ Three main components in a typical recommendation system
❏ data for learning model parameters
❏ learning of recommendation models based on the collected
data and providing recommendations to users
❏ collecting data from users feedback
❏ Feedback loop, an essential nature of recommendation system
❖ Chen, Jiawei, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. "Bias and debias in
recommender system: A survey and future directions." ACM Transactions on Information Systems 41, no.
3 (2023): 1-39.
❖ Zhang, Shuai, Lina Yao, Aixin Sun, and Yi Tay. "Deep learning based recommender system: A survey and
new perspectives." ACM computing surveys (CSUR) 52, no. 1 (2019): 1-38.
feedback loop
Typical Recommender System
3

❏ Existing of various biases in each of these steps in the
feedback loop
❏ e.g., in the collection phase, users affected by cognitive
biases may select popular items/ items ranked ﬁrst
❏ Serious consequences of these various biases
❏ e.g., Matthew Effect: The rich get richer phenomenon
collection
(rate, click, visit, …)
learning
Recommender Engine
Data
User
recommendation
(top-N, rating, …)
bias ampliﬁcation
along the loop
Selection Bias
Exposure Bias
Conformity Bias
Position Bias
Popularity Bias
Ranking Bias
…
Biases in Recommender System
4
❖ Chen, Jiawei, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. "Bias and debias in
recommender system: A survey and future directions." ACM Transactions on Information Systems 41, no.
3 (2023): 1-39.
❖ Wang, Yifan, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. "A survey on the fairness of
recommender systems." ACM Transactions on Information Systems 41, no. 3 (2023): 1-43.

❏ User Fairness vs. Item Fairness
❏ fair behavior among users
❏ parity or proportional parity to exposure of items
❏ Group Fairness vs. Individual Fairness
❏ divide user/items into two or more groups
❏ ﬁnding similar items/users for fair treatment
among these individuals
❏ Single-sided Fairness vs. Multi-sided Fairness
❏ fairness for multiple parties involved
❏ e.g., Uber Eats: consumer, provider, and delivery
Consumer
Providers
Provided
Items
Recommendation
Engine
Recommended
Items
Side Stakeholder
Preferences
❖ Abdollahpouri, Himan, and Robin Burke. "Multi-stakeholder recommendation and its connection
to multi-sided fairness." arXiv preprint arXiv:1907.13158 (2019).
❖ Rahmani, Hossein A., Yashar Deldjoo, Ali Tourani, and Mohammadmehdi Naghiaei. "The
unfairness of active users and popularity bias in point-of-interest recommendation." In Advances
in Bias and Fairness in Information Retrieval: Third BIAS International Workshop, 2022
Different Perspective in Fairness
5

Pre-processing In-processing Post-processing
e.g. data rebalancing e.g. regularization e.g. re-ranking optimisation
6
debiasing the training data before
feeding it into recommendation
training model
learning the parameters of model
while considering fairness criteria
working with generated results to
provide fair recommendation
Mitigating Harmful Biases: Strategies
Dataset Recommendations Engine Recommendation Evaluation
Debiased
Data
Fair
Model
Fair
Evaluation

7
Main Research Question!
How generalizable are the proposed systems with respect
to different aspects of a recommendation system?

By The WISE Lab (Web Intelligent Systems and Economics) at Rutgers
University (https:/
/wise.cs.rutgers.edu/)
Li, Yunqi, Hanxiong Chen, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang.
"User-oriented fairness in recommendation." In Proceedings of the Web
Conference, 2021
Feedback
fair recommendation
Learning
Recommender System
Data
User
UFR
Post-processing
User
Fairness
(Group)
Single-sided Fairness
8
User-oriented Fairness Re-ranking (UFR)
❏ Highly active users consist of only small percentage of users in
the system receive signiﬁcantly better recommendations

Dataset Rec. System Top-N Rec.
MF, PMF,
NeuMF, STAMP
Rating Matrix Rec. Lists
Solver Execution
(user-fairness)
Active-user relevant item
Inactive-user relevant item
Re-ranked Top-N Rec.
Fair Rec. Lists
Re-ranking Approach
UFR Model
DEFirst '23 | Mila x Vector Institute | Generalizability of User-oriented Fairness in Recommender Systems 9

❏ A re-ranking integer programming method
❏ Objective function
❏ Re-ranking the recommendation list of each user provided by the baseline
algorithms such as MF
❏ Constraints
❏ Minimizing the difference in performance between the groups of users (UGF)
■ UGF: difference between accuracy of model for adv. and disadv. groups
❏ Select only K items to recommend
UFR Re-ranking Method
Solver Execution
(user-fairness)
10

Reproducibility Aspects Overview
Effect of Data Characteristics
8 different datasets, and 6 domains.
Ability to propagate bias differs from
one another
6 deep and shallow models.
Effect of the user grouping methods
on the performance
Level of activity and consumption of popular items.
Underlying trends and trade-offs
between various evaluation metrics.
NDCG, UGF, Novelty, GAP.
Domain Base Rec. Alg. User Groups Assumptions Fairness vs. Effectiveness Metrics
11

Dataset Users Items Interactions Sparsity Feedback Domain
MovieLens 943 1,349 99,287 92.19% Explicit Movie
Epinion 2,677 2,060 103,567 98.12% Explicit Opinion
LastFM 1,797 1,507 62,376 97.69% Implicit Music
BookCrossing 1,136 1,019 20,522 98.22% Explicit Book
Amazon Toy 2,170 1,733 32,852 99.12% Explicit eCommerce
Amazon Ofﬁce 2,448 1,596 36,841 99.05% Explicit eCommerce
Gowalla 1,130 1,189 66,245 95.06% Implicit POI
Foursquare 1,568 1,461 42,678 98.13% Implicit POI
12
Datasets and Domains

weights each user and item latent features positively and
models them using the Poisson distribution.
assigns smaller weights to negative samples and assumes
that for two items their latent features are independent.
applies non-linear activation to train the mapping between
users and items features that are concatenated from MLP
and MF layers.
introduces a generative model with multinomial likelihood
Traditional
PF
WMF
NeuMF
VAECF
Neural
MostPop
Recommendation Models
BPR
Basic
uses a pairwise ranking loss whose goal is to optimize
personalized ranking.
non-personalized method that recommends the relevant
items to each user. Popularity is measured by the number
of interactions of items
13

Advantaged
Disadvantaged
95 %
5%
Users Groups (G1)
80 %
20 %
Advantaged
Disadvantaged
User Groups (G2)
Training Data
G1: The number of interaction/level of activity
Training Data
G2: The consumption of popular items
Grouping Methods
80 %
20 %
Popular
short-head
Unpopular
long-tail
Items Groups
Training Data
The number of interactions they received

Different Stakeholders
User relevance (nDCG) Item exposure
● All
● Advantaged
● Disadvantaged
● User-oriented Group Fairness (UGF)
○ |Advantaged - Disadvantaged|
● Fairness improvement – Δ%(UGF)
15
● Novelty
● Coverage
● Short-head recommended items
● Long-tail recommender items
● delta Group Average Popularity (ΔGAP)
○ The difference of average popularity score in
the recommendation list and user proﬁle.
Fairness Assumption and Evaluation

❏ Google Colab
❏ Jupyter Notebooks
❏ Repository
❏ https://github.com/rahman
idashti/FairRecSys
❏ Cornac1
❏ recommendation toolkit
❏ MIP2
, Groubi3
❏ optimization toolkit
1
https://cornac.preferred.ai/
2
https://www.python-mip.com/
3
https://www.gurobi.com/
❏ Having access to the relevance
judgments
❏ Relevance labels are estimated
based on the training data.
Relevance Estimation
Implementation Details Code and Data
16
Reproducibility Setup

17
❏ Each box plot shows the variation of the performance
of UFR model on 6 baseline algorithms
❏ Certain domains exhibit different patterns
❏ Existing of a wide variation in terms of improvement
of fairness before and after applying models among
different datasets
❏ A wider range of variance in UGF improvement on
implicit feedback datasets compared to explicit
feedback datasets
❏ The higher number of interactions per user and item
can lead to better performance of fairness model
Domain and Datasets
(a) G1
(b) G1 (c) G2

18
❏ Observations suggest that the user grouping assumption
also affects the sensitivity of a model to fairness
❏ WMF demonstrates the most robust performance in
mitigating user-oriented unfairness, exhibiting the least
variance, as well as the best average improvement in
terms of UGF
❏ Comparing Figures (a) and (b), we observe two
considerably different behaviors of different base
ranking models when it comes to the variance and UGF
improvement
Recommendation Models

19
Item Exposure

20
Item Exposure

21
Item Exposure

22
Item Exposure

Fairness vs. Effectiveness Metrics
23
❏ Positive correlation of improvement fairness of the
systems and the overall accuracy of system
❏ Positive correlation of improvement fairness of
the systems and the overall beyond-accuracy of
system
❏ Not only able to mitigate the user biases but also
to improve the overall performance of the system
❏ No meaningful correlation between fairness
improvement and ΔGAP (calibration)
(2) G2
(1) G1

(a) G1
(b) G2
(c) G1
(d) G2
User Grouping Methods

❏ Our experiment on reproducing UFR using 6 base recommendation algorithms shows that UFR is
algorithm agnostic. In other words, it will improve fairness regardless of the baseline recommendation
algorithm but to a different extent.
❏ We found that the user grouping method is one of the most important aspects of the user fairness
algorithm since it directly affects our interpretation of an algorithm’s fair behavior.
❏ We believe that our experiments and shared supplementary resources open opportunities in both
reproducibility and evaluation in this area.
❏ Exploring other methodologies for mitigating unfairness that can show less dependency on the data
distribution and domain properties should be explored in this area.
25
Conclusion

Hossein A. (Saeed) Rahmani
UCL AI Center
University College London
London, United Kingdom
hossein.rahmani.22@ucl.ac.uk
@srahmanidashti
rahmanidashti.github.io
Any questions?
https://github.com/rahmanidashti/FairRecSys
Thank you! ;)
DEFirst Reading Group
Mila x Vector Institute
23 March, 2023

GeneralizibilityFairness - DEFirst Reading Group

Recommended

Recommended

More Related Content

Similar to GeneralizibilityFairness - DEFirst Reading Group

Similar to GeneralizibilityFairness - DEFirst Reading Group (20)

More from Hossein A. (Saeed) Rahmani

More from Hossein A. (Saeed) Rahmani (8)

Recently uploaded

Recently uploaded (20)

GeneralizibilityFairness - DEFirst Reading Group