Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)

WIN-WIN SEARCH: DUAL-AGENT
STOCHASTIC GAME IN SESSION
SEARCH
Jiyun Luo Sicong Zhang Grace Hui Yang
Department of Computer Science
Georgetown University
{jl1749, sz303}@georgetown.edu
huiyang@cs.georgetown.edu1

A NEW PERSPECTIVE TO LOOK AT SEARCH
3
Documents
to explore
Information
need
Observed
documents
User
Devise a strategy
for helping the
user explore the
information
space in order to
learn which
documents are
relevant and
which aren’t,
and satisfy their
information
need.
3

WHY USERS MAKE CERTAIN MOVES?
4
Markov Chain of Decision Making States

RELATED WORK
! queries suitable for personalization [Teevan et al. SIGIR’08]
! task types [Kanoulas et al. TREC’12]
! roles of task stage and task type [Liu et al. SIGIR’10]
! session query changes [Guan et al. SIGIR’13]
! user intensions and attention [Carterette et al. CIKM’11]
! user click model [Craswell et al. SIGIR’07]
! page re-ranking [Jin et al. WWW’13]
! Search topics [Jones et al. CIKM’08]
! Ads selection using pomdp[Yuan et al. CIKM’12]
!Our work is a retrieval model
! not a user study
5

OUR SOLUTION
6
Try to find an optimal solution
through a sequence of dynamic
interactions
Trial and Error:
learn from repeated, varied attempts
which are continued until success
6

TRIAL AND ERROR
7
! q1 – "dulles hotels"
! q2 – "dulles airport"
! q3 – "dulles airport
location"
! q4 – "dulles metrostop"
7

8
! Rich interactions
Query formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.
! Temporal dependency
! Overall goal
RECAP – CHARACTERISTICS OF
DYNAMIC IR
8

9
! Model interactions, which means it needs to have place
holders for actions;
! Model information need hidden behind user queries and
other interactions;
! Set up a reward mechanism to guide the entire search
algorithm to adjust its retrieval strategies;
! Represent Markov properties to handle the temporal
dependency.
WHAT IS A DESIRABLE MODEL FOR
DYNAMIC IR
A model in Trial and Error setting will do!
A Markov Model will do!
9

10
! Two agents work together to fulfill the information need
! Dual-agent stochastic game
! Partially Observable Markov Decision Process
! Joint Optimization
!To achieve Win-win
WIN-WIN SEARCH

WIN-WIN SEARCH
11
! A tuple (S, T, A, R, γ, O, Θ, B)
! S : state space
! T: transition matrix
! A: action space(Au, Ase, Σu, Σse)
! R: reward function(Ru, Rse)
! γ: discount factor, 0< γ ≤1
! O: observation set(Ωu, Ωse)
an observation is a symbol emitted according to a hidden state.
! Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system
transitions into state s after taking action a, i.e. P(o|s,a).
! B: belief space
Belief is a probability distribution over hidden states.

12
Name Symbol Meanings
state S the four hidden decision states
user action Au add/remove/keep query terms
search engine
action
Ase increase/decrease/keep term weights, adjust
search techniques, etc.
message from user
to search engine
Σu clicked and SAT clicked documents
message from
search engine to
user
Σse top k returned documents
user's observation Ωu observations that the user makes from the world
search engine's
observation
Ωse observations that the search engine makes from
the world and from the user
user reward Ru relevant information the user gains from
reading the documents
search engine
reward
Rse nDCG that the search gains by returning
documents
belief state B belief states generated from the belief updater
and shared by both agents

STATES (S)
13
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
! scooter price ⟶ scooter
stores
! collecting old US coins⟶
selling old US coins
! Philadelphia NYC travel ⟶
Philadelphia NYC train
! Boston tourism ⟶ NYC
tourism
q0

ACTIONS (AU, ASE, ΣU, ΣSE)
! User Action (Au)
! add query terms (+Δq)
! remove query terms (-Δq)
! keep query terms (qtheme)
! Search Engine Action(Ase)
! increase term weights
! decrease term weights
! keep term weights
! adjust search techniques, etc.
! Message from the user(Σu)
! clicked documents
! SAT clicked documents
! Message from search engine(Σse)
! top k returned documents 14

1. At iteration t,
the user agent
takes action *+
,
(query change).
15
2. The search engine picks the best action *-.
, to search
DUAL-AGENT STOCHASTIC GAME

3. Search engine
returns
document set Dt
as message 4-.
, .
16
4. The user agent
examines Dt
and sends clicks
as feedback
messages 4+
, .
34
Messages are essentially documents that an agent
thinks they are relevant.

5. The user agent
again makes
action 5+
,67
(query changes).
6. The world
moves into
iteration t + 1.
7. The loop
continues
17
4 3

OBSERVATION FUNCTION (O)
18
Probability of making observation ω after
taking action a and landing in state s
e.g., Prob. of making observation ω after
taking action a and landing in state
SRT=O(SREL, a, ω)O(SEXPLOITATION, a, ω)

! Intuition """" Relevant or Non-relevant?
! Observation function
89:, ; Re=, 4+, ?, ; Re=) ∝ A9:, ; Re=|?, ; Re=)A9?, ; Re=|4+)
! A :, ; Re= ?, ; Re= and A9?, ; CD=|4+) are estimated from
! log data
! TREC ground truth. 19
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ D∃d ∈ D∃d ∈ D∃d ∈ Dtttt----1111 andandandand
d is SAT Clickedd is SAT Clickedd is SAT Clickedd is SAT Clicked
otherwise
# TU TV:DWXDY WD=DX5Z[D
# TU TV:DWX5]TZ:
# TU ob:DWXDY W_D WD=DX5Z[D
# TUTV:DWXDY WD=DX5Z[D

! Intuition """" Exploration or Exploitation˛˛˛˛
! Observation Function
89:, ; àb=TW5]TZ, 5+ ; cde,, 4-. ; f,g7, ?, ; àb=TW5]TZ)
∝ A9:, ; àb=TW5]TZ|?, ; àb=TW5]TZ)A9?, ; àb=TW5]TZ| c de,, f,g7)
! A9:, ; àb=TW5]TZ|?, ; àb=TW5]TZ) 5ZY A9?, ; àb=TW5]TZ| c de,, f,g7)
are estimated
! log data
! human judgment.
20
st is likely to be
Exploration
Exploitation
if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∉D∉D∉D∉Dtttt----1111))))
oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅∅∅∅ andandandand ----ΔΔΔΔqqqqtttt≠∅≠∅≠∅≠∅ ))))
if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∈∈∈∈DDDDtttt----1111))))
oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅∅∅∅ andandandand ––––ΔΔΔΔqqqqtttt;∅ );∅ );∅ );∅ )
# TU TV:DWXDY Dab=TW5]TZ Y_D T 5YY DWl:
# TU TV:DWX5]TZ: Y_D T 5YY DWl:
# TU TV:DWXDY W_D Dab=TW5]TZ
# TU TV:DWXDY Dab=TW5]TZ

22
! q1=“best US destinations”
observation= NRR
SRT
Relevant &
Exploitation
0.1784
SRR
Relevant &
Exploration
0.1135
SNRT
Non-Relevant
& Exploitation
0.2838
SNRR
Non-Relevant
& Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for
a month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
BELIEF UPDATES (B)
q0

23
observation= NRR
! q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant
& Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0

24
observation= NRR
Boston”
observation = RT
SRT
Relevant &
Exploitation
0.0005
SRR
Relevant &
Exploration
0.0068
SNRT
Non-Relevant
& Exploitation
0.0715
SNRR
Non-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0

25
observation= NRR
Boston”
observation = RT
! q3=“maps.bing.com”
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant
& Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0

26
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0151
SRR
Relevant &
Exploration
0.4347
SNRT
Non-Relevant
& Exploitation
0.0276
SNRR
Non-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0

27
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant
& Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790
! q20=“Philadelphia NYC
train”
observation = NRT
……
BELIEF UPDATES (B)
q0

28
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0291
SRR
Relevant &
Exploration
0.7837
SNRT
Non-Relevant
& Exploitation
0.0081
SNRR
Non-Relevant
& Exploration
0.1790
train”
observation = NRT
……
BELIEF UPDATES (B)
q0

29
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant
& Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505
……
train”
observation = NRT
bus”
observation = NRT
BELIEF UPDATES (B)
q0

30
observation= NRR
Boston”
observation = RT
observation = NRT
SRT
Relevant &
Exploitation
0.0304
SRR
Relevant &
Exploration
0.8126
SNRT
Non-Relevant
& Exploitation
0.0066
SNRR
Non-Relevant
& Exploration
0.1505
……
train”
observation = NRT
bus”
observation = NRT
BELIEF UPDATES (B)
q0

! The long term reward function for the search engine agent
! The long tern reward function for the user agent
! Joint optimization
31
s-.9V, 5) ; oV9:)C9:, 5)
-∈r
c t o A9?|V, 5+, 4-.)A9?|V, 4+)l5a
u
s-.9Vv, 5w
x∈y
s+9V, 5+) ; C9:, 5+) c t z {9:,|:,g7, f,g7)u|
max-~•€
s+9:,g7, 5+)
= P(qt|d) +t z P9e,|e,g7, f,g7, 5)u
max‚~•€
A 9e,g7|f,g7)
5-. ; argmax
u
9s-.9V, 5) c s+9V, 5+))
JOINT OPTIMIZATION — WIN-WIN

EXPERIMENTS
! Evaluate on TREC 2012 and 2013 Session Tracks
! The session logs contain
! session topic
! user queries
! previously retrieved URLs, snippets
! user clicks, and dwell time etc.
! Task: retrieve 2,000 documents for the last query in each
session
! The evaluation is based on the whole session.
! A document related to any query in the session is a good document
32
! Datasets
! ClueWeb09 CatB
! ClueWeb12 CatB
! spam documents are
removed
! duplicated documents
are removed

ACTIONS
! increasing weights of the added terms by a factor of
x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
! decreasing weights of the added terms by a factor of
y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};
! QCM proposed in Guan et. al SIGIR’13;
! Pseudo Relevance Feedback which assumes the top
20 retrieved documents are relevant;
! directly uses the query in current iteration to perform
retrieval;
! combines all queries in a session weights them
equally.
33

SEARCH ACCURACY
! Search accuracy on TREC 2012 Session Track
34
TREC 2012 Session Track
# Win-win outperforms most retrieval algorithms on TREC 2012.

35
# Systems in TREC 2012 perform better than in TREC 2013.
# many relevant documents are not included in ClueWeb12 CatB collection
# Win-win outperforms all retrieval algorithms on TREC 2013.
# It is highly effective in Session Search.
SEARCH ACCURACY
! Search accuracy on TREC 2013 Session Track
TREC 2013 Session Track

IMMEDIATE SEARCH ACCURACY
36
# Original run: top returned documents provided by TREC log data
# win-win’s immediate search accuracy is better than the Original at
every iteration
# win-win's immediate search accuracy increases while the number of
search iterations increases
TREC 2012 Session Track TREC 2013 Session Track

Conclusions
37
! A novel session search framework
! Model the interactions between user and search
engine as a dual-agent stochastic game
! Able to perform efficient optimization
! a finite discrete set of states and actions
! Jointly search for the goal in a trial-and-error
manner

THANK YOU
huiyang@cs.georgetown.edu
38

Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)

Recommended

Recommended

More Related Content

Similar to Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)

Similar to Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014) (20)

Recently uploaded

Recently uploaded (20)

Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)