Slidedeck of my lecture at SIKS Course "Advances in Information Retrieval"
Read more here: https://graus.nu/blog/bias-in-recommendations-lecture-siks-course-on-advances-in-ir/
1. Bias in Recommendations
@ SIKS Course "Advances in Information Retrieval"
! David Graus
✉ david.graus@fdmediagroep.nl
🐦 @dvdgrs
2. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
2
3. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
2
4. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
• 🏢 Industry
• Editor radio/online public broadcaster NTR (between BA & MSc)
• Research Intern @ Microsoft Research, US
• Data Scientist @ Company.info (FD Mediagroep)
• Lead Data Scientist @ FD SMART Journalism / BNR SMART Radio
2
5. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
3
6. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
3
7. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
• Personalization & RecSys at FD Mediagroep
3
8. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
• Personalization & RecSys at FD Mediagroep
• Two flavors of bias in RecSys
• Model/Algorithmic bias
• Perceived bias in personalization
3
10. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 5
11. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
FD Mediagroup
12. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
13. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
14. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
15. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
16. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
17. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
18. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FD Mediagroup
19. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FD Mediagroup
10
20. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
21. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
22. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
23. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
24. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
25. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
26. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
27. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
28. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
29.
30. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Radio
• (Transcribe)
• Segment
• Tag
• Serve
14
31. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Transcribe
15
32. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Segment
• Based on metadata,
text, and audio.
16
33. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Segment
• Based on metadata,
text, and audio.
16
34. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Tag
• Simple multilabel text
classifier
• Trained on transcripts of
segments + associated tags
from website
17
35. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
36. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
37. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
39. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
40. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
41. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
21
42. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
21
43. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
21
44. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
21
45. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
21
46. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
21
47. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
• Snippet-to-summary abstractor (?)
21
48. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
• Snippet-to-summary abstractor (?)
21
67. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
68. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
69. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
70. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
71. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
72. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
73. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
74. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
75. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
76. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
77. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
78. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
79. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
80. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
81. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
82. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
83. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
84. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
85. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
86. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
87. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging, Big
Data, Blog, Davos, Google, Technologie
Rubriek: Ondernemen, Davos
Stylometrie: CharLen=3491, WordLen=635, CharLen=2856,
WordLen=524
Entities: Qualcomm, Apple (2), NXP, Intel, Google (2), Microsoft,
Salesforce
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile
88. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Model
• Content-based RecSys
• Ranking w/ point-wise LTR
• Features: user, article, user-article features (~14k)
• Labels: implicit feedback
• Clicks (i.e., click = 1, non-click = 0)
• Trained nightly
28
89. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias?
• “Disproportionate weight in favor of or against an idea or thing,
usually in a way that is closed-minded, prejudicial, or unfair.”
29
90. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in RecSys
“Algorithmic”
I. In Collaborative Filtering methods
II. In implicit feedback/clicks
30
91. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Collaborative
Filtering
31
92. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Collaborative
Filtering
31
93. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in CF
32
94. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
Bias in CF
32
95. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
Bias in CF
32
96. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
Bias in CF
32
97. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
Bias in CF
32
98. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
Bias in CF
32
99. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
• Solution: cluster long-tail items
Bias in CF
32
100. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
• Solution: cluster long-tail items
Bias in CF
32
101. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
102. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
103. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
• Position/“trust" bias (see Joachims et al., 2005)
• Eye-tracking study + comparison w/ explicit feedback shows;
• Clicks reflect relevance judgments
• Clicks ranked highly receive more clicks
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
104. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
• Position/“trust" bias (see Joachims et al., 2005)
• Eye-tracking study + comparison w/ explicit feedback shows;
• Clicks reflect relevance judgments
• Clicks ranked highly receive more clicks
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
105. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Perceived Bias from RecSys
34
106. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Perceived Bias from RecSys
• A state of intellectual isolation that
allegedly can result from personalized
searches when a website algorithm
selectively guesses what information a
user would like to see based on
information about the user.
• As a result, users become separated
from information that disagrees with
their viewpoints.
34
107. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
35
108. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
35
109. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
• On average, 11.7% of results show differences due to
personalization on Google.
• Varies widely by search query and by result ranking.
• Only found measurable personalization as a result of searching
with a logged in account and the IP address of the searching user.
35
110. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
36[Hannák et al., 2013]
111. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
36[Hannák et al., 2013]
112. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
36[Hannák et al., 2013]
113. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
36[Hannák et al., 2013]
114. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
36[Hannák et al., 2013]
115. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
36[Hannák et al., 2013]
116. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
36[Hannák et al., 2013]
117. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
36[Hannák et al., 2013]
118. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
2. Have them issue the same set of queries
36[Hannák et al., 2013]
119. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
2. Have them issue the same set of queries
3. Compare results
36[Hannák et al., 2013]
120. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
37[Hannák et al., 2013]
121. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• On average, 11.7% of results show differences due to
personalization on Google.
37[Hannák et al., 2013]
122. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• On average, 11.7% of results show differences due to
personalization on Google.
• Top ranks tend to be less personalized than bottom ranks.
37[Hannák et al., 2013]
123. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
38[Hannák et al., 2013]
124. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• ✅ Personalization based on location (e.g., company names)
38[Hannák et al., 2013]
125. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• ✅ Personalization based on location (e.g., company names)
• ❌ The least personalized results tend to be factual and health related
queries.
38[Hannák et al., 2013]
126. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
39[Hannák et al., 2013]
127. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
39[Hannák et al., 2013]
128. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
39[Hannák et al., 2013]
129. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
39[Hannák et al., 2013]
130. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
39[Hannák et al., 2013]
131. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
39[Hannák et al., 2013]
132. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history
39[Hannák et al., 2013]
133. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history
❌ Browsing history
39[Hannák et al., 2013]
134. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity to pop the filter bubble
40
135. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity to pop the filter bubble
40
136. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
41[Nguyen et al., 2014]
137. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
41[Nguyen et al., 2014]
138. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
41[Nguyen et al., 2014]
139. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
41[Nguyen et al., 2014]
140. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
41[Nguyen et al., 2014]
141. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
• Diversity of recommendations
41[Nguyen et al., 2014]
142. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
• Diversity of recommendations
• Ratings of movies
41[Nguyen et al., 2014]
143. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
42[Nguyen et al., 2014]
144. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
42[Nguyen et al., 2014]
145. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
42[Nguyen et al., 2014]
146. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
42[Nguyen et al., 2014]
147. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
42[Nguyen et al., 2014]
148. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).
42[Nguyen et al., 2014]
149. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).
• Stable average ratings for followers (~3.68).
42[Nguyen et al., 2014]
150. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity in RecSys 🤖 vs. humans 👤?
43
151. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity in RecSys 🤖 vs. humans 👤?
43
152. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
44[Möller et al. 2018]
153. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
44[Möller et al. 2018]
154. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
• 👤 Compare to hand-picked article recommendations.
44[Möller et al. 2018]
155. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
• 👤 Compare to hand-picked article recommendations.
• Measure & compare “diversity” of recommended articles:
• At content level
• At tag level
• At category level
• At sentiment/subjectivity level
44[Möller et al. 2018]
156. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
157. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
158. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
159. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
160. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
161. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
162. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
“Conventional recommendation algorithms at least preserve the
topic/sentiment diversity of the article supply.”
45[Möller et al. 2018]
163. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
More diversity
46
164. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
More diversity
46
165. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Aim
Increase exposure to varied political opinions
with a goal of improving civil discourse
47[Yom-Tov et al. 2014]
166. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Classify searchers into political leaning (using geo data)
48[Yom-Tov et al. 2014]
167. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
49[Yom-Tov et al. 2014]
168. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
169. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
170. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
171. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
• Identify polarized search queries (with strong political leanings —
in both directions).
49[Yom-Tov et al. 2014]
172. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
50[Yom-Tov et al. 2014]
173. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Treatment group: Insert red results for blue users, and blue
results for red users
50[Yom-Tov et al. 2014]
174. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Treatment group: Insert red results for blue users, and blue
results for red users
• Control group: Do not adjust results
50[Yom-Tov et al. 2014]
175. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
51
176. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
51
177. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
51
178. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
51
179. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
2. Engagement: Average number of queries + average read
articles.
51
180. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
52[Yom-Tov et al. 2014]
181. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
• Less clicks on inserted opposing sources.
52[Yom-Tov et al. 2014]
182. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
• Less clicks on inserted opposing sources.
• But:
“Results pages of the opposing viewpoint which had a similarity
higher than the average tended to be clicked 38% more than those
below the average.”
52[Yom-Tov et al. 2014]
183. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
53[Yom-Tov et al. 2014]
184. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
53[Yom-Tov et al. 2014]
185. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
53[Yom-Tov et al. 2014]
186. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
53[Yom-Tov et al. 2014]
187. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
53[Yom-Tov et al. 2014]
188. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%
53[Yom-Tov et al. 2014]
189. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%
• Control: Small reduction in both (~2.5%)
53[Yom-Tov et al. 2014]
190. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Refs
Algorithmic bias
1. Park & Tuzhilin, The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
2. Meyer, Recommender systems in industrial contexts (2012)
3. Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation (RMSE@RecSys ’19)
4. Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
Perceived bias / filter bubbles
5. Hannak et al., Measuring personalization of web search (WWW ’13)
6. Nguyen et al., Exploring the filter bubble: the effect of using recommender systems on content diversity (WWW ’14)
7. Möller et al., Do not blame it on the algorithm — An empirical assessment of multiple recommender systems and their impact
on content diversity (Information Communication and Society ’18)
8. Yom-Tov et al., Promoting Civil Discourse Through Search Engine Diversity (Social Science Computer Review, ’13)
54
191. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 55