SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Feeling Lucky? Multi-armed bandits for Ordering Judgements
in Pooling-based Evaluation
David E. Losada
Javier Parapar, Álvaro Barreiro
ACM SAC, 2016
Evaluation
is crucial
compare retrieval algorithms, design
new search solutions, ...
information retrieval evaluation:
3 main ingredients
docs
information retrieval evaluation:
3 main ingredients
queries
information retrieval evaluation:
3 main ingredients
relevance
judgements
relevance assessments are incomplete
relevance assessments are incomplete
...
search system 1 search system 2 search system 3 search system n
relevance assessments are incomplete
...
search system 1 search system 2 search system 3 search system n
relevance assessments are incomplete
1. WSJ13
2. WSJ17
.
.
100. AP567
101. AP555
.
.
.
1. FT941
2. WSJ13
.
.
100. WSJ19
101. AP555
.
.
.
1. ZF207
2. AP881
.
.
100. FT967
101. AP555
.
.
.
1. WSJ13
2. CR93E
.
.
100. AP111
101. AP555
.
.
.
...
rankings of docs by estimated relevance (runs)
relevance assessments are incomplete
1. WSJ13
2. WSJ17
.
.
100. AP567
101. AP555
.
.
.
1. FT941
2. WSJ13
.
.
100. WSJ19
101. AP555
.
.
.
1. ZF207
2. AP881
.
.
100. FT967
101. AP555
.
.
.
1. WSJ13
2. CR93E
.
.
100. AP111
101. AP555
.
.
.
...
pool
depth
rankings of docs by estimated relevance (runs)
relevance assessments are incomplete
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
...
pool
depth
rankings of docs by estimated relevance (runs)
relevance assessments are incomplete
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
...
pool
depth
rankings of docs by estimated relevance (runs)
WSJ13
WSJ17 AP567
WSJ19AP111 CR93E
ZF207AP881FT967
pool
...
relevance assessments are incomplete
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
101. AP555
.
.
.
...
pool
depth
rankings of docs by estimated relevance (runs)
WSJ13
WSJ17 AP567
WSJ19AP111 CR93E
ZF207AP881FT967
pool
...
human assessments
finding relevant docs is the key
Most productive use of assessors' time
is spent on judging relevant docs
(Sanderson & Zobel, 2005)
Effective adjudication methods
Give priority to pooled docs that are
potentially relevant
Can signifcantly reduce the num. of
judgements required to identify a given
num. of relevant docs
But most existing methods are adhoc...
Our main idea...
Cast doc adjudication as a
reinforcement learning problem
Doc judging is an iterative process where
we learn as judgements come in
Doc adjudication as a reinforcement learning problem
Initially we know nothing about the quality of the runs
? ? ? ?...
As judgements
come in...
And we can adapt and allocate more docs for judgement from
the most promising runs
Multi-armed bandits
...
unknown probabilities of giving a prize
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
Multi-armed bandits
...
unknown probabilities of giving a prize
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
Multi-armed bandits
...
unknown probabilities of giving a prize
Multi-armed bandits
...
unknown probabilities of giving a prize
play and observe the reward
Multi-armed bandits
...
unknown probabilities of giving a prize
exploration vs exploitation
exploits current knowledge
spends no time sampling inferior actions
maximizes expected reward
on the next action
explores uncertain actions
gets more info about expected payofs
may produce greater total reward
in the long run
allocation methods: choose next action (play) based on past plays and obtained rewards
implement diferent ways to trade between exploration and exploitation
Multi-armed bandits for ordering judgements
...
machines = runs
...
play a machine = select a run and get the next (unjudged) doc
1. WSJ13
2. CR93E
.
.
(binary) reward = relevance/non-relevance of the selected doc
Allocation methods tested
...
random ϵn
-greedy
with prob 1-ϵ plays the machine
with the highest avg reward
with prob ϵ plays a
random machine
prob of exploration (ϵ) decreases
with the num. of plays
Upper Confdence Bound
(UCB)
computes upper confdence
bounds for avg rewards
conf. intervals get narrower
with the number of plays
selects the machine with the
highest optimistic estimate
Allocation methods tested: Bayesian bandits
prior probabilities of giving a relevant doc: Uniform(0,1) ( or, equivalently, Beta(α,β), α,β=1 )
U(0,1) U(0,1) U(0,1) U(0,1)
...
evidence (O ∈ {0,1}) is Bernoulli (or, equivalently, Binomial(1,p) )
posterior probabilities of giving a relevant doc: Beta(α+O, β+1-O) (Beta: conjugate prior
for Binomial)
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
Allocation methods tested: Bayesian bandits
...
we iteratively update our estimations using Bayes:
two strategies to select the next machine:
Bayesian Learning Automaton (BLA): draws a sample from each the posterior distribution
and selects the machine yieding the highest sample
MaxMean (MM): selects the machine with the highest expectation of the posterior distribution
test different document adjudication strategies in
terms of how quickly they find the relevant
docs in the pool
experiments
# rel docs found at diff. number of
judgements performed
experiments: data
experiments: baselines
...WSJ13
WSJ17 AP567
WSJ19AP111 CR93E
ZF207AP881FT967
pool
...
AP111, AP881, AP567, CR93E, FT967, WSJ13, ...
DocId: sorts by Doc Id
experiments: baselines
1. WSJ13
2. WSJ17
.
.
100. AP567
...
1. FT941
2. WSJ13
.
.
100. WSJ19
1. WSJ13
2. CR93E
.
.
100. AP111
WSJ13, FT941, ZF207, WSJ17, CR93E, AP881 ...
Rank: rank #1 docs go 1st, then rank #2 docs, ...
1. ZF207
2. AP881
.
.
100. FT967
experiments: baselines
1. WSJ13
2. WSJ17
3. AP567
.
.
...
1. FT941
2. WSJ13
3. WSJ19
.
.
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
starts with uniform priorities for all runs (e.g. max priority=100)
selects a random run (from those with max priority)
1. ZF207
2. AP881
3. FT967
.
.
100 100 100 100
experiments: baselines
1. WSJ13
2. WSJ17
3. AP567
.
.
...
1. FT941
2. WSJ13
3. WSJ19
.
.
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
starts with uniform priorities for all runs (e.g. max priority=100)
selects a random run (from those with max priority)
1. ZF207
2. AP881
3. FT967
.
.
100 100 100 100
experiments: baselines
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run
stays in the run until a non-rel doc is found
100
experiments: baselines
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run
stays in the run until a non-rel doc is found
100
WSJ13
experiments: baselines
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run
stays in the run until a non-rel doc is found
100
WSJ13, CR93E
experiments: baselines
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run
stays in the run until a non-rel doc is found
100
WSJ13, CR93E, AP111
experiments: baselines
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
extracts & judges docs from the selected run
stays in the run until a non-rel doc is found
when a non-rel doc is found, priority is decreased
100 99
WSJ13, CR93E, AP111
experiments: baselines
1. WSJ13
2. WSJ17
3. AP567
.
.
...
1. FT941
2. WSJ13
3. WSJ19
.
.
1. WSJ13
2. CR93E
3. AP111
.
.
MoveToFront (MTF) (Cormack et al 98)
and we jump again to another max priority run
1. ZF207
2. AP881
3. FT967
.
.
100 100 99 100
experiments: baselines
1. WSJ13
2. WSJ17
3. AP567
.
...
1. FT941
2. WSJ13
3. WSJ19
.
1. WSJ13
2. CR93E
3. AP111
.
Moffat et al.'s method (A) (Moffat et al 2007)
based on rank-biased precision (RBP)
sums a rank-dependent score for each doc
1. ZF207
2. AP881
3. FT967
.
score
0.20
0.16
0.13
.
experiments: baselines
1. WSJ13
2. WSJ17
3. AP567
.
...
1. FT941
2. WSJ13
3. WSJ19
.
1. WSJ13
2. CR93E
3. AP111
.
Moffat et al.'s method (A) (Moffat et al 2007)
based on rank-biased precision (RBP)
sums a rank-dependent score for each doc
1. ZF207
2. AP881
3. FT967
.
score
0.20
0.16
0.13
.
all docs are ranked by decreasing accummulated score
and the ranked list defines the order in which docs are judged
WSJ13: 0.20+0.16+0.20+...
experiments: baselines
Moffat et al.'s method (B) (Moffat et al 2007)
evolution over A's method
considers not only the rank-dependent doc's
contributions but also the runs' residuals
promotes the selection of docs from runs with many
unjudged docs
Moffat et al.'s method (C) (Moffat et al 2007)
evolution over B's method
considers not only the rank-dependent doc's and the residuals
promotes the selection of docs from effective runs
experiments: baselines
MTF: best performing baseline
experiments: MTF vs bandit-based models
experiments: MTF vs bandit-based models
Random: weakest approach
BLA/UCB/ϵn
-greedy are suboptimal
(sophisticated exploration/exploitation trading
not needed)
MTF and MM: best performing methods
improved bandit-based models
MTF: forgets quickly about past rewards
(a single non-relevance doc triggers a jump)
non-stationary
bandit-based
solutions:
not all historical
rewards count the
same
MM-NS and BLA-NS
non-stationary
variants of MM and
BLA
stationary bandits
Beta( , ), , =1α β α β
rel docs add 1 to α
non-rel docs add 1 to β
(after n iterations)
Beta(αn
,βn
)
αn
=1+jrels
βn
=1+jrets
– jrels
jrels
: # judged relevant docs (retrieved by s)
jrets
: # judged docs (retrieved by s)
all judged docs count the same
non-stationary bandits
Beta( , ), , =1α β α β
jrels
= rate*jrels
+ reld
jrets
= rate*jrets
+ 1
(after n iterations)
Beta(αn
,βn
)
αn
=1+jrels
βn
=1+jrets
– jrels
rate>1: weights more early relevant docs
rate<1: weights more late relevant docs
rate=0: only the last judged doc counts
(BLA-NS, MM-NS)
rate=1: stationary version
experiments: improved bandit-based models
conclusions
multi-arm bandits: formal & effective framework for
doc adjudication in a pooling-based evaluation
it's not good to increasingly reduce exploration
(UCB, ϵn
-greedy)
it's good to react quickly to non-relevant docs
(non-stationary variants)
future work
query-related
variabilities
hierarchical
bandits
stopping
criteria
metasearch
reproduce our experiments & test new ideas!
http://tec.citius.usc.es/ir/code/pooling_bandits.html
(our R code, instructions, etc)
David E. Losada
Javier Parapar, Álvaro Barreiro
Feeling Lucky? Multi-armed bandits for Ordering Judgements
in Pooling-based Evaluation
Acknowledgements:
MsSaraKelly. picture pg 1 (modified).CC BY 2.0.
Sanofi Pasteur. picture pg 2 (modified).CC BY-NC-ND 2.0.
pedrik. picture pgs 3-5.CC BY 2.0.
Christa Lohman. picture pg 3 (left).CC BY-NC-ND 2.0.
Chris. picture pg 4 (tag cloud).CC BY 2.0.
Daniel Horacio Agostini. picture pg 5 (right).CC BY-NC-ND 2.0.
ScaarAT. picture pg 14.CC BY-NC-ND 2.0.
Sebastien Wiertz. picture pg 15 (modified).CC BY 2.0.
Willard. picture pg 16 (modified).CC BY-NC-ND 2.0.
Jose Luis Cernadas Iglesias. picture pg 17 (modified).CC BY 2.0.
Michelle Bender. picture pg 25 (left).CC BY-NC-ND 2.0.
Robert Levy. picture pg 25 (right).CC BY-NC-ND 2.0.
Simply Swim UK. picture pg 37.CC BY-SA 2.0.
Sarah J. Poe. picture pg 55.CC BY-ND 2.0.
Kate Brady. picture pg 58.CC BY 2.0.
August Brill. picture pg 59.CC BY 2.0.
This work was supported by the
“Ministerio de Economía y Competitividad”
of the Goverment of Spain and
FEDER Funds under
research projects
TIN2012-33867 and TIN2015-64282-R.

Mais conteúdo relacionado

Destaque

Predictive Modeling in Underwriting
Predictive Modeling in UnderwritingPredictive Modeling in Underwriting
Predictive Modeling in UnderwritingKevin Pledge
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshopodsc
 
Advanced Pricing in General Insurance
Advanced Pricing in General InsuranceAdvanced Pricing in General Insurance
Advanced Pricing in General InsuranceSyed Danish Ali
 
Insurance pricing
Insurance pricingInsurance pricing
Insurance pricingLincy PT
 

Destaque (6)

Predictive Modeling in Underwriting
Predictive Modeling in UnderwritingPredictive Modeling in Underwriting
Predictive Modeling in Underwriting
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
 
Advanced Pricing in General Insurance
Advanced Pricing in General InsuranceAdvanced Pricing in General Insurance
Advanced Pricing in General Insurance
 
Actuarial Analytics in R
Actuarial Analytics in RActuarial Analytics in R
Actuarial Analytics in R
 
Princing insurance contracts with R
Princing insurance contracts with RPrincing insurance contracts with R
Princing insurance contracts with R
 
Insurance pricing
Insurance pricingInsurance pricing
Insurance pricing
 

Semelhante a Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation

Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
cs-171-07-Games and Adversarila Search.ppt
cs-171-07-Games and Adversarila Search.pptcs-171-07-Games and Adversarila Search.ppt
cs-171-07-Games and Adversarila Search.pptSamiksha880257
 
Main Task Submit the Following 1. Calculate the sample size.docx
Main Task Submit the Following 1. Calculate the sample size.docxMain Task Submit the Following 1. Calculate the sample size.docx
Main Task Submit the Following 1. Calculate the sample size.docxinfantsuk
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptSanGeet25
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamBrent Heard
 
Final examexamplesapr2013
Final examexamplesapr2013Final examexamplesapr2013
Final examexamplesapr2013Brent Heard
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
Lab23 chisquare2007
Lab23 chisquare2007Lab23 chisquare2007
Lab23 chisquare2007sbarkanic
 
Minmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxMinmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxPriyadharshiniG41
 
Statistical tests
Statistical testsStatistical tests
Statistical testsmartyynyyte
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Eric Choi
 
Ch. 11 Simulations Good
Ch. 11 Simulations GoodCh. 11 Simulations Good
Ch. 11 Simulations Goodchristjt
 

Semelhante a Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation (20)

Anova.ppt
Anova.pptAnova.ppt
Anova.ppt
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
cs-171-07-Games and Adversarila Search.ppt
cs-171-07-Games and Adversarila Search.pptcs-171-07-Games and Adversarila Search.ppt
cs-171-07-Games and Adversarila Search.ppt
 
Main Task Submit the Following 1. Calculate the sample size.docx
Main Task Submit the Following 1. Calculate the sample size.docxMain Task Submit the Following 1. Calculate the sample size.docx
Main Task Submit the Following 1. Calculate the sample size.docx
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 
Probability unit2.pptx
Probability unit2.pptxProbability unit2.pptx
Probability unit2.pptx
 
GA.pptx
GA.pptxGA.pptx
GA.pptx
 
Final examexamplesapr2013
Final examexamplesapr2013Final examexamplesapr2013
Final examexamplesapr2013
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
jfs-masters-1
jfs-masters-1jfs-masters-1
jfs-masters-1
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Lab23 chisquare2007
Lab23 chisquare2007Lab23 chisquare2007
Lab23 chisquare2007
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Minmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxMinmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptx
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
 
Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)Data Analytics Project_Eun Seuk Choi (Eric)
Data Analytics Project_Eun Seuk Choi (Eric)
 
Ch. 11 Simulations Good
Ch. 11 Simulations GoodCh. 11 Simulations Good
Ch. 11 Simulations Good
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation

  • 1. Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation David E. Losada Javier Parapar, Álvaro Barreiro ACM SAC, 2016
  • 2. Evaluation is crucial compare retrieval algorithms, design new search solutions, ...
  • 3. information retrieval evaluation: 3 main ingredients docs
  • 4. information retrieval evaluation: 3 main ingredients queries
  • 5. information retrieval evaluation: 3 main ingredients relevance judgements
  • 7. relevance assessments are incomplete ... search system 1 search system 2 search system 3 search system n
  • 8. relevance assessments are incomplete ... search system 1 search system 2 search system 3 search system n
  • 9. relevance assessments are incomplete 1. WSJ13 2. WSJ17 . . 100. AP567 101. AP555 . . . 1. FT941 2. WSJ13 . . 100. WSJ19 101. AP555 . . . 1. ZF207 2. AP881 . . 100. FT967 101. AP555 . . . 1. WSJ13 2. CR93E . . 100. AP111 101. AP555 . . . ... rankings of docs by estimated relevance (runs)
  • 10. relevance assessments are incomplete 1. WSJ13 2. WSJ17 . . 100. AP567 101. AP555 . . . 1. FT941 2. WSJ13 . . 100. WSJ19 101. AP555 . . . 1. ZF207 2. AP881 . . 100. FT967 101. AP555 . . . 1. WSJ13 2. CR93E . . 100. AP111 101. AP555 . . . ... pool depth rankings of docs by estimated relevance (runs)
  • 11. relevance assessments are incomplete 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . ... pool depth rankings of docs by estimated relevance (runs)
  • 12. relevance assessments are incomplete 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . ... pool depth rankings of docs by estimated relevance (runs) WSJ13 WSJ17 AP567 WSJ19AP111 CR93E ZF207AP881FT967 pool ...
  • 13. relevance assessments are incomplete 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . 101. AP555 . . . ... pool depth rankings of docs by estimated relevance (runs) WSJ13 WSJ17 AP567 WSJ19AP111 CR93E ZF207AP881FT967 pool ... human assessments
  • 14. finding relevant docs is the key Most productive use of assessors' time is spent on judging relevant docs (Sanderson & Zobel, 2005)
  • 15. Effective adjudication methods Give priority to pooled docs that are potentially relevant Can signifcantly reduce the num. of judgements required to identify a given num. of relevant docs But most existing methods are adhoc...
  • 16. Our main idea... Cast doc adjudication as a reinforcement learning problem Doc judging is an iterative process where we learn as judgements come in
  • 17. Doc adjudication as a reinforcement learning problem Initially we know nothing about the quality of the runs ? ? ? ?... As judgements come in... And we can adapt and allocate more docs for judgement from the most promising runs
  • 19. Multi-armed bandits ... unknown probabilities of giving a prize play and observe the reward
  • 21. Multi-armed bandits ... unknown probabilities of giving a prize play and observe the reward
  • 23. Multi-armed bandits ... unknown probabilities of giving a prize play and observe the reward
  • 25. exploration vs exploitation exploits current knowledge spends no time sampling inferior actions maximizes expected reward on the next action explores uncertain actions gets more info about expected payofs may produce greater total reward in the long run allocation methods: choose next action (play) based on past plays and obtained rewards implement diferent ways to trade between exploration and exploitation
  • 26. Multi-armed bandits for ordering judgements ... machines = runs ... play a machine = select a run and get the next (unjudged) doc 1. WSJ13 2. CR93E . . (binary) reward = relevance/non-relevance of the selected doc
  • 27. Allocation methods tested ... random ϵn -greedy with prob 1-ϵ plays the machine with the highest avg reward with prob ϵ plays a random machine prob of exploration (ϵ) decreases with the num. of plays Upper Confdence Bound (UCB) computes upper confdence bounds for avg rewards conf. intervals get narrower with the number of plays selects the machine with the highest optimistic estimate
  • 28. Allocation methods tested: Bayesian bandits prior probabilities of giving a relevant doc: Uniform(0,1) ( or, equivalently, Beta(α,β), α,β=1 ) U(0,1) U(0,1) U(0,1) U(0,1) ... evidence (O ∈ {0,1}) is Bernoulli (or, equivalently, Binomial(1,p) ) posterior probabilities of giving a relevant doc: Beta(α+O, β+1-O) (Beta: conjugate prior for Binomial)
  • 29. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 30. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 31. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 32. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 33. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 34. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 35. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes:
  • 36. Allocation methods tested: Bayesian bandits ... we iteratively update our estimations using Bayes: two strategies to select the next machine: Bayesian Learning Automaton (BLA): draws a sample from each the posterior distribution and selects the machine yieding the highest sample MaxMean (MM): selects the machine with the highest expectation of the posterior distribution
  • 37. test different document adjudication strategies in terms of how quickly they find the relevant docs in the pool experiments # rel docs found at diff. number of judgements performed
  • 39. experiments: baselines ...WSJ13 WSJ17 AP567 WSJ19AP111 CR93E ZF207AP881FT967 pool ... AP111, AP881, AP567, CR93E, FT967, WSJ13, ... DocId: sorts by Doc Id
  • 40. experiments: baselines 1. WSJ13 2. WSJ17 . . 100. AP567 ... 1. FT941 2. WSJ13 . . 100. WSJ19 1. WSJ13 2. CR93E . . 100. AP111 WSJ13, FT941, ZF207, WSJ17, CR93E, AP881 ... Rank: rank #1 docs go 1st, then rank #2 docs, ... 1. ZF207 2. AP881 . . 100. FT967
  • 41. experiments: baselines 1. WSJ13 2. WSJ17 3. AP567 . . ... 1. FT941 2. WSJ13 3. WSJ19 . . 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority) 1. ZF207 2. AP881 3. FT967 . . 100 100 100 100
  • 42. experiments: baselines 1. WSJ13 2. WSJ17 3. AP567 . . ... 1. FT941 2. WSJ13 3. WSJ19 . . 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) starts with uniform priorities for all runs (e.g. max priority=100) selects a random run (from those with max priority) 1. ZF207 2. AP881 3. FT967 . . 100 100 100 100
  • 43. experiments: baselines 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) extracts & judges docs from the selected run stays in the run until a non-rel doc is found 100
  • 44. experiments: baselines 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) extracts & judges docs from the selected run stays in the run until a non-rel doc is found 100 WSJ13
  • 45. experiments: baselines 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) extracts & judges docs from the selected run stays in the run until a non-rel doc is found 100 WSJ13, CR93E
  • 46. experiments: baselines 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) extracts & judges docs from the selected run stays in the run until a non-rel doc is found 100 WSJ13, CR93E, AP111
  • 47. experiments: baselines 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) extracts & judges docs from the selected run stays in the run until a non-rel doc is found when a non-rel doc is found, priority is decreased 100 99 WSJ13, CR93E, AP111
  • 48. experiments: baselines 1. WSJ13 2. WSJ17 3. AP567 . . ... 1. FT941 2. WSJ13 3. WSJ19 . . 1. WSJ13 2. CR93E 3. AP111 . . MoveToFront (MTF) (Cormack et al 98) and we jump again to another max priority run 1. ZF207 2. AP881 3. FT967 . . 100 100 99 100
  • 49. experiments: baselines 1. WSJ13 2. WSJ17 3. AP567 . ... 1. FT941 2. WSJ13 3. WSJ19 . 1. WSJ13 2. CR93E 3. AP111 . Moffat et al.'s method (A) (Moffat et al 2007) based on rank-biased precision (RBP) sums a rank-dependent score for each doc 1. ZF207 2. AP881 3. FT967 . score 0.20 0.16 0.13 .
  • 50. experiments: baselines 1. WSJ13 2. WSJ17 3. AP567 . ... 1. FT941 2. WSJ13 3. WSJ19 . 1. WSJ13 2. CR93E 3. AP111 . Moffat et al.'s method (A) (Moffat et al 2007) based on rank-biased precision (RBP) sums a rank-dependent score for each doc 1. ZF207 2. AP881 3. FT967 . score 0.20 0.16 0.13 . all docs are ranked by decreasing accummulated score and the ranked list defines the order in which docs are judged WSJ13: 0.20+0.16+0.20+...
  • 51. experiments: baselines Moffat et al.'s method (B) (Moffat et al 2007) evolution over A's method considers not only the rank-dependent doc's contributions but also the runs' residuals promotes the selection of docs from runs with many unjudged docs Moffat et al.'s method (C) (Moffat et al 2007) evolution over B's method considers not only the rank-dependent doc's and the residuals promotes the selection of docs from effective runs
  • 52. experiments: baselines MTF: best performing baseline
  • 53. experiments: MTF vs bandit-based models
  • 54. experiments: MTF vs bandit-based models Random: weakest approach BLA/UCB/ϵn -greedy are suboptimal (sophisticated exploration/exploitation trading not needed) MTF and MM: best performing methods
  • 55. improved bandit-based models MTF: forgets quickly about past rewards (a single non-relevance doc triggers a jump) non-stationary bandit-based solutions: not all historical rewards count the same MM-NS and BLA-NS non-stationary variants of MM and BLA
  • 56. stationary bandits Beta( , ), , =1α β α β rel docs add 1 to α non-rel docs add 1 to β (after n iterations) Beta(αn ,βn ) αn =1+jrels βn =1+jrets – jrels jrels : # judged relevant docs (retrieved by s) jrets : # judged docs (retrieved by s) all judged docs count the same non-stationary bandits Beta( , ), , =1α β α β jrels = rate*jrels + reld jrets = rate*jrets + 1 (after n iterations) Beta(αn ,βn ) αn =1+jrels βn =1+jrets – jrels rate>1: weights more early relevant docs rate<1: weights more late relevant docs rate=0: only the last judged doc counts (BLA-NS, MM-NS) rate=1: stationary version
  • 58. conclusions multi-arm bandits: formal & effective framework for doc adjudication in a pooling-based evaluation it's not good to increasingly reduce exploration (UCB, ϵn -greedy) it's good to react quickly to non-relevant docs (non-stationary variants)
  • 60. reproduce our experiments & test new ideas! http://tec.citius.usc.es/ir/code/pooling_bandits.html (our R code, instructions, etc)
  • 61. David E. Losada Javier Parapar, Álvaro Barreiro Feeling Lucky? Multi-armed bandits for Ordering Judgements in Pooling-based Evaluation Acknowledgements: MsSaraKelly. picture pg 1 (modified).CC BY 2.0. Sanofi Pasteur. picture pg 2 (modified).CC BY-NC-ND 2.0. pedrik. picture pgs 3-5.CC BY 2.0. Christa Lohman. picture pg 3 (left).CC BY-NC-ND 2.0. Chris. picture pg 4 (tag cloud).CC BY 2.0. Daniel Horacio Agostini. picture pg 5 (right).CC BY-NC-ND 2.0. ScaarAT. picture pg 14.CC BY-NC-ND 2.0. Sebastien Wiertz. picture pg 15 (modified).CC BY 2.0. Willard. picture pg 16 (modified).CC BY-NC-ND 2.0. Jose Luis Cernadas Iglesias. picture pg 17 (modified).CC BY 2.0. Michelle Bender. picture pg 25 (left).CC BY-NC-ND 2.0. Robert Levy. picture pg 25 (right).CC BY-NC-ND 2.0. Simply Swim UK. picture pg 37.CC BY-SA 2.0. Sarah J. Poe. picture pg 55.CC BY-ND 2.0. Kate Brady. picture pg 58.CC BY 2.0. August Brill. picture pg 59.CC BY 2.0. This work was supported by the “Ministerio de Economía y Competitividad” of the Goverment of Spain and FEDER Funds under research projects TIN2012-33867 and TIN2015-64282-R.