Struggling and Success in Web Search @ CIKM2015

Struggling and Success in Web Search
Daan Odijk, University of Amsterdam,The Netherlands
Ryen W.White,Ahmed Hassan Awadallah & Susan T. Dumais 
Microsoft Research, Redmond,WA, USA

Motivation
• Long web search sessions are common
• Half of web search sessions contain multiple
queries [Shokouhi et al., SIGIR’13]
• 40% of sessions take 3+ minutes [Dumais, NSF'13]
• Account for most of search time
• What are searchers doing? Struggling or Exploring. 
[Hassan et al., WSDM’14]
• Struggling in 60% of long sessions, often successful

Motivation
• Long web search sessions are common
• Half of web search sessions contain multiple
queries [Shokouhi et al., SIGIR’13]
• 40% of sessions take 3+ minutes [Dumais, NSF'13]
• Account for most of search time
• What are searchers doing? Struggling or Exploring. 
[Hassan et al., WSDM’14]
• Struggling in 60% of long sessions, often successful
remaining 36% as struggling sessions. In total, the data se
sisted of 3000 sessions with 17,117 queries, 13,168 distinct qu
and 13,780 result clicks.
We also asked the judges to assess the success of each session
the following labels:
Successful: Sessions where searchers were able to loca
required information.
Partially Successful: Sessions were searchers failed to
some of the required information.
Unsuccessful: Sessions where searchers failed to loca
required information.
Note that struggling is a characterization of the search p
while success is a characterization of its outcome. Hence it i
sible for a user who has difficulty locating the required inform
(struggling) to end up locating it (success). It is also possibl
user to fail in locating the required information without stru
(e.g., submit a single unsuccessful query then give up).
The distribution of success labels across session types is sho
Figure 3. The figure shows that most of the exploring sessio
successful (more than 75%) or partially successful (more
20%). This agrees with our definition of exploring sessions,
are open-ended and multi-faceted in nature. Hence, failing to
the required information is likely to prevent exploring early
Figure 2. Session type distribution as labeled by judges.
40%
23%
36%
1%
Exploring
Exploring with Struggle
Struggling
Cannot Judge
0%
20%
40%
60%
80%
Exploring Exploring with
Struggle
Struggling
%sessionswithlabel
Successful Partially Successful Unsuccessful
3. Filter short sessions: After removing navigational queries
and segmenting the logs into topically coherent sessions, we
exclude sessions with less than three unique queries since
these are unlikely to be exploring or struggling sessions.
We applied these criteria to identify long topically-coherent ses-
sions because there are sessions in which users are likely to show
exploring or struggling behavior. There were many thousands of
such sessions in our data. We sampled 3000 of them and instructed
external human judges to examine each session, try to understand
the user’s experience, and identify the reason for the observed be-
havior. We now describe the process by which the labels were col-
lected from external judges.
Figure 2. Session type distribution as labeled by judges.
Figure 3. Session success distribution as labeled by judges.
40%
23%
36%
Exploring
Exploring with Struggle
Struggling
Cannot Judge
0%
20%
40%
60%
80%
Exploring Exploring with
Struggle
Struggling
%sessionswithlabel
Successful Partially Successful Unsuccessful

9:13:11 AM ! Query us open
9:13:24 AM ! Query us open golf
9:13:36 AM ! Query us open golf 2013 live
9:13:59 AM ! Query watch us open live streaming
9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s-
open-live-online-ﬁnal-round-free-streaming-video
9:31:55 AM END
Example of Struggling
Logged session from June 2014

9:31:55 AM END
Pivotal Query

9:31:55 AM END
How do searchers go from struggle to success?
Can we help them struggle less?
Pivotal Query
How & why do searchers struggle?

Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Struggling searchers behave differently given different
outcomes on many search aspects, including: 
queries, reformulations, clicks, dwell time & topic.

June 1–7, 2014
1. Filter sessions: US & English, start with a typed query
2. Segment into tasks: Topically coherent sub-sessions with at least 3 queries
3. Filter struggling tasks [Hassan et al., WSDM’14]
4. Partition based on ﬁnal clicks
First
Query
Second
Query
Last
Query
2,937,450
Successful
Tasks
4,508,821
Unsuccessful
Tasks
…
No click or dwell time < 10s
Dwell time > 30s or end of task
Commonly used
proxy for success.
Identifying Struggling Sessions

Query Grouping
First
Query
Query
Last
Query
…
All intermediate
queries

Queries become longer
1 2 3 4 5 6 7 8+
Length of fLUst queUy
0%
10%
20%
30% 6uccessful
1 2 3 4 5 6 7 8+
Length of otheU queULes
1 2 3 4 5 6 7 8+
Length of last queUy
8nsuccessful
First query short:
3.35 vs 3.23 terms
Longer queries:
4.29 vs 4.04 terms
Larger difference:
4.29 vs 3.93 terms
100% 50% 0%
4ueUy SeUceQtile
foU fiUst queUy
0%
10%
20%
30%
40% Successful
100% 50% 0%
4ueUy SeUceQtile
foU iQteUmediate queUies
UQsuccessful
100% 50% 0%
4ueUy SeUceQtile
foU fiUst queUy
0%
10%
20%
30%
40% Successful
Length of intermediate queriesLength of first query Length of last query Shorter

Query Reformulation
0% 25% 50%
sSeFialize
generalize
substitute
sSelling
baFN
new
From Iirst Tuery
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
0% 25% 50%
Intermediate
SuFFessIul
No
Yes
New Shares no terms with previous queries.
Back Exact repeat of a previous query in the task.
Spelling Changed the spelling, e.g. ﬁxing a typo.
Substitution Replaced a single term with another term.
Generalization Removed a term from the query.
Specialization Added a term to the query. [Hassan, EMNLP’13]
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
25% 50%
ermediate
SuFFessIul
No
Yes
Iinal Tuery
ComSared
to aYerage
AboYe
%elow

Outline
Struggling searchers in successful vs unsuccessful tasks
issue fewer and shorter queries. Query reformulations
indicate more trouble choosing correct vocabulary.

Outline
Struggling searchers in successful vs unsuccessful tasks
issue fewer and shorter queries. Query reformulations
indicate more trouble choosing correct vocabulary.
Crowd-sourcing to better understand: connection between
struggling and success & query transitions.  
First an exploratory pilot, then main annotations.

• Validate partitioning on success based on ﬁnal clicks
• Understanding struggling based on open-ended
questions:
• Could you describe how the user in the more successful task
eventually succeeded in becoming successful?
• Why was the user in the less successful task struggling to ﬁnd the
information they were looking for?
• What might you do differently if you were the user in the less
successful task?
Exploratory Pilot:
175 task pairs with identical initial queriesAnnotating pairs of tasks

Exploratory Pilot:
175 task pairs with identical initial queries
• For 35 initial queries with 40%-60% successful vs not:
• Five task pairs successful-unsuccessful for each query
• 175 task pairs annotated
First
Query
Second
Query
Last
Query
2,937,450
Successful
4,508,821
Unsuccessful
…

Exploratory Pilot
0% 20% 40% 60% 80% 100%
142Agreement 33
0% 20% 40% 60% 80% 100%
973icked 6uccessful 9 36
Last
Query
Reasonable
proxy for succes
Between
three judges
Which task was
more successful?
Which task was
more successful?

Intent-based query
reformulation taxonomy
Added, removed
or substituted
☐ an action (e.g., download, contact)
☐ an attribute (e.g., printable, free, )
Speciﬁed ☐ a particular instance 
(e.g. added a brand name or version number)
Rephrased ☐ Corrected a spelling error or typo
☐ Used a synonym or related term
Switched ☐ to a related task (changed main focus)
☐ to a new task
Based on the answers to the open-ended questions

Main Annotations:
Transitions in 659 successful tasks
• Annotated 659 new successful tasks
• Single tasks, no pairs
• Annotated transitions between each query pair
First
Query
Second
Query
Last
Query 2,937,450
Successful
Tasks
…

Success & Pivotal Query
Q1 Q2 Q3 Q4 Q5 Q6 Q7
Pivotal Query
0
50
100
150
200
Numberoftasks
Continued
Final Query
19% 40% 39%
not at all somewhat mostly completely
Task Success
Made at least
some progress
62%
Location of the pivotal query

Query Transitions
Added Attribute
6SeFiIied IQstaQFe
6ubstituted Attribute
6witFhed tR 5eOated
5emRved Attribute
5eShrased w/6yQRQym
Added AFtiRQ
6witFhed tR 1ew
6ubstituted AFtiRQ
CRrreFted 7ySR
5emRved AFtiRQ
465
374
203
179
99
91
80
46
41
34
18
4uery 7raQsitiRQ
FrRm Iirst Tuery
2thers
7R IiQaO Tuery

Query Transitions
Added Attribute
6SeFiIied IQstaQFe
6witFhed tR 5eOated
5emRved Attribute
5eShrased w/6yQRQym
Added AFtiRQ
6witFhed tR 1ew
6ubstituted AFtiRQ
CRrreFted 7ySR
5emRved AFtiRQ
465
374
203
179
99
91
80
46
41
34
18
4uery 7raQsitiRQ
FrRm Iirst Tuery
2thers
7R IiQaO Tuery
5emRved ActiRn
CRrrected 7ySR
6ubstituted ActiRn
6witched tR 1ew
5eShrased w/6ynRnym
Added ActiRn
5emRved Attribute
6witched tR 5elated
6SeciIied Instance
Added Attribute
25% 50% 25% 11%
21% 53% 26% 6%
19% 50% 27% 15%
46% 29% 25% 13%
25% 36% 36% 13%
19% 41% 39% 8%
12% 54% 34% 12%
22% 40% 33% 10%
16% 47% 34% 6%
30% 32% 36% 12%
14% 45% 40% 11%
nRt at all sRmewhat mRstly cRmSletely | SivRtal

Outline
Substantial differences in how searchers reﬁne queries
in different stages in a struggling task. Strong connections
with task outcomes, and particular pivotal queries.
Struggling searchers, successful or not, behave
differently on many search aspects, including: 
queries, reformulations, clicks, dwell time & topic.

Predicting Reformulation
What query reformulation
strategy will a user employ?
Added Attribute
6SeFiIied IQstaQFe
6witFhed tR 5eOated
5emRved Attribute
5eShrased w/6yQRQym
Added AFtiRQ
6witFhed tR 1ew
6ubstituted AFtiRQ
CRrreFted 7ySR
5emRved AFtiRQ
465
374
203
179
99
91
80
46
41
34
18
4uery 7raQsitiRQ
FrRm Iirst Tuery
2thers
7R IiQaO Tuery
First
Query
Second
Query
Tailor query
suggestions
Tailor auto-
completions
Re-rank results
or suggestions

Feature Sets
+History Features
Query Features
Interaction Features
Transition Features
First
Query
Second
Query
Tailor query
suggestions
Tailor auto-
completions
Re-rank results
or suggestions

0%
15%
30%
45%
60%
Baseline Q1 +Interaction Q2 Baseline Q2 +Interaction Q3
10%
20%
Q1 to Q2 Q2 to Q3
0%
15%
30%
45%
60%
Baseline Q1 +Interaction Q2 Baseline Q2 +Interaction Q3
Q1 to Q2 Q2 to Q3
53%
48%
46%
39%
34%
30%
20%
F1
Majority
Prediction Results
Q1 Q2 Q2 Q3

Outline
Accurately predicting reformulation strategy
Provide situation-speciﬁc support at a higher level
Substantial differences in how searchers behave
in a struggling task, depending on task outcomes.

Limitations
• Very speciﬁc and apparent type of struggling
• Determination of search success
• Final click as a proxy
• Judgements by third-party, not by searchers
themselves

Future Work
• Directly applying reformulation strategies
• Query suggestions & auto-completions
• Mining query ➣ pivotal query pairs
• Can identify automatically: F1 59%  
(ﬁnal query baseline: 51%)
• Hints and tips on reformulation strategies

Struggling and Success in Web Search @ CIKM2015

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Struggling and Success in Web Search @ CIKM2015