Slides for our CIKM2015 paper that received the best student paper award. This paper is the result of my internship at Microsoft Research in Redmond. In this joint work with Ryen White, Ahmed Hassan Awadallah and Susan Dumais, we investigate why some web searchers succeed where others struggle.
PDF: http://hdl.handle.net/11245/2.163418
Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle is important in improving search systems. We address this important issue using a mixed methods study using large-scale logs, crowd-sourced labeling, and predictive modeling. We analyze anonymized search logs from the Microsoft Bing Web search engine to characterize aspects of struggling searches and better explain the relationship between struggling and search success. To broaden our understanding of the struggling process beyond the behavioral signals in log data, we develop and utilize a crowd-sourced labeling methodology. We collect third-party judgments about why searchers appear to struggle and, if appropriate, where in the search task it became clear to the judges that searches would succeed (i.e., the pivotal query). We use our findings to propose ways in which systems can help searchers reduce struggling. Key components of such support are algorithms that accurately predict the nature of future actions and their anticipated impact on search outcomes. Our findings have implications for the design of search systems that help searchers struggle less and succeed more.
Daan Odijk, Ryen W. White, Ahmed Hassan Awadallah and Susan T. Dumais. Struggling and Success in Web Search. In CIKM ’15, 2015.
BibTeX: @inproceedings{odijk2015struggling,
Author = {Odijk, Daan and White, Ryen W. and Hassan Awadallah, Ahmed and Dumais, Susan T.},
Booktitle = {CIKM 2015: 24th ACM International Conference on Information and Knowledge Management},
Month = {October},
Publisher = {ACM},
Title = {Struggling and Success in Web Search},
Year = {2015}}
1. Struggling and Success in Web Search
Daan Odijk, University of Amsterdam,The Netherlands
Ryen W.White,Ahmed Hassan Awadallah & Susan T. Dumais
Microsoft Research, Redmond,WA, USA
2. Motivation
• Long web search sessions are common
• Half of web search sessions contain multiple
queries [Shokouhi et al., SIGIR’13]
• 40% of sessions take 3+ minutes [Dumais, NSF'13]
• Account for most of search time
• What are searchers doing? Struggling or Exploring.
[Hassan et al., WSDM’14]
• Struggling in 60% of long sessions, often successful
3. Motivation
• Long web search sessions are common
• Half of web search sessions contain multiple
queries [Shokouhi et al., SIGIR’13]
• 40% of sessions take 3+ minutes [Dumais, NSF'13]
• Account for most of search time
• What are searchers doing? Struggling or Exploring.
[Hassan et al., WSDM’14]
• Struggling in 60% of long sessions, often successful
remaining 36% as struggling sessions. In total, the data se
sisted of 3000 sessions with 17,117 queries, 13,168 distinct qu
and 13,780 result clicks.
We also asked the judges to assess the success of each session
the following labels:
Successful: Sessions where searchers were able to loca
required information.
Partially Successful: Sessions were searchers failed to
some of the required information.
Unsuccessful: Sessions where searchers failed to loca
required information.
Note that struggling is a characterization of the search p
while success is a characterization of its outcome. Hence it i
sible for a user who has difficulty locating the required inform
(struggling) to end up locating it (success). It is also possibl
user to fail in locating the required information without stru
(e.g., submit a single unsuccessful query then give up).
The distribution of success labels across session types is sho
Figure 3. The figure shows that most of the exploring sessio
successful (more than 75%) or partially successful (more
20%). This agrees with our definition of exploring sessions,
are open-ended and multi-faceted in nature. Hence, failing to
the required information is likely to prevent exploring early
Figure 2. Session type distribution as labeled by judges.
40%
23%
36%
1%
Exploring
Exploring with Struggle
Struggling
Cannot Judge
0%
20%
40%
60%
80%
Exploring Exploring with
Struggle
Struggling
%sessionswithlabel
Successful Partially Successful Unsuccessful
3. Filter short sessions: After removing navigational queries
and segmenting the logs into topically coherent sessions, we
exclude sessions with less than three unique queries since
these are unlikely to be exploring or struggling sessions.
We applied these criteria to identify long topically-coherent ses-
sions because there are sessions in which users are likely to show
exploring or struggling behavior. There were many thousands of
such sessions in our data. We sampled 3000 of them and instructed
external human judges to examine each session, try to understand
the user’s experience, and identify the reason for the observed be-
havior. We now describe the process by which the labels were col-
lected from external judges.
Figure 2. Session type distribution as labeled by judges.
Figure 3. Session success distribution as labeled by judges.
40%
23%
36%
Exploring
Exploring with Struggle
Struggling
Cannot Judge
0%
20%
40%
60%
80%
Exploring Exploring with
Struggle
Struggling
%sessionswithlabel
Successful Partially Successful Unsuccessful
4. 9:13:11 AM ! Query us open
9:13:24 AM ! Query us open golf
9:13:36 AM ! Query us open golf 2013 live
9:13:59 AM ! Query watch us open live streaming
9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s-
open-live-online-final-round-free-streaming-video
9:31:55 AM END
Example of Struggling
Logged session from June 2014
5. 9:13:11 AM ! Query us open
9:13:24 AM ! Query us open golf
9:13:36 AM ! Query us open golf 2013 live
9:13:59 AM ! Query watch us open live streaming
9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s-
open-live-online-final-round-free-streaming-video
9:31:55 AM END
Example of Struggling
Logged session from June 2014
Pivotal Query
6. 9:13:11 AM ! Query us open
9:13:24 AM ! Query us open golf
9:13:36 AM ! Query us open golf 2013 live
9:13:59 AM ! Query watch us open live streaming
9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s-
open-live-online-final-round-free-streaming-video
9:31:55 AM END
Example of Struggling
Logged session from June 2014
How do searchers go from struggle to success?
Can we help them struggle less?
Pivotal Query
How & why do searchers struggle?
7. Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Struggling searchers behave differently given different
outcomes on many search aspects, including:
queries, reformulations, clicks, dwell time & topic.
8. June 1–7, 2014
1. Filter sessions: US & English, start with a typed query
2. Segment into tasks: Topically coherent sub-sessions with at least 3 queries
3. Filter struggling tasks [Hassan et al., WSDM’14]
4. Partition based on final clicks
First
Query
Second
Query
Last
Query
2,937,450
Successful
Tasks
4,508,821
Unsuccessful
Tasks
…
No click or dwell time < 10s
Dwell time > 30s or end of task
Commonly used
proxy for success.
Identifying Struggling Sessions
10. Queries become longer
1 2 3 4 5 6 7 8+
Length of fLUst queUy
0%
10%
20%
30% 6uccessful
1 2 3 4 5 6 7 8+
Length of otheU queULes
1 2 3 4 5 6 7 8+
Length of last queUy
8nsuccessful
First query short:
3.35 vs 3.23 terms
Longer queries:
4.29 vs 4.04 terms
Larger difference:
4.29 vs 3.93 terms
100% 50% 0%
4ueUy SeUceQtile
foU fiUst queUy
0%
10%
20%
30%
40% Successful
100% 50% 0%
4ueUy SeUceQtile
foU iQteUmediate queUies
UQsuccessful
100% 50% 0%
4ueUy SeUceQtile
foU fiUst queUy
0%
10%
20%
30%
40% Successful
Length of intermediate queriesLength of first query Length of last query Shorter
11. Query Reformulation
0% 25% 50%
sSeFialize
generalize
substitute
sSelling
baFN
new
From Iirst Tuery
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
0% 25% 50%
Intermediate
SuFFessIul
No
Yes
New Shares no terms with previous queries.
Back Exact repeat of a previous query in the task.
Spelling Changed the spelling, e.g. fixing a typo.
Substitution Replaced a single term with another term.
Generalization Removed a term from the query.
Specialization Added a term to the query. [Hassan, EMNLP’13]
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
25% 50%
ermediate
SuFFessIul
No
Yes
Iinal Tuery
ComSared
to aYerage
AboYe
%elow
12. Query Reformulation
0% 25% 50%
sSeFialize
generalize
substitute
sSelling
baFN
new
From Iirst Tuery
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
0% 25% 50%
Intermediate
SuFFessIul
No
Yes
New Shares no terms with previous queries.
Back Exact repeat of a previous query in the task.
Spelling Changed the spelling, e.g. fixing a typo.
Substitution Replaced a single term with another term.
Generalization Removed a term from the query.
Specialization Added a term to the query. [Hassan, EMNLP’13]
0% 25% 50%
To Iinal Tuery
ComSared
to aYerage
AboYe
%elow
25% 50%
ermediate
SuFFessIul
No
Yes
Iinal Tuery
ComSared
to aYerage
AboYe
%elow
13. Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Struggling searchers in successful vs unsuccessful tasks
issue fewer and shorter queries. Query reformulations
indicate more trouble choosing correct vocabulary.
14. Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Struggling searchers in successful vs unsuccessful tasks
issue fewer and shorter queries. Query reformulations
indicate more trouble choosing correct vocabulary.
Crowd-sourcing to better understand: connection between
struggling and success & query transitions.
First an exploratory pilot, then main annotations.
15. • Validate partitioning on success based on final clicks
• Understanding struggling based on open-ended
questions:
• Could you describe how the user in the more successful task
eventually succeeded in becoming successful?
• Why was the user in the less successful task struggling to find the
information they were looking for?
• What might you do differently if you were the user in the less
successful task?
Exploratory Pilot:
175 task pairs with identical initial queriesAnnotating pairs of tasks
16. Exploratory Pilot:
175 task pairs with identical initial queries
• For 35 initial queries with 40%-60% successful vs not:
• Five task pairs successful-unsuccessful for each query
• 175 task pairs annotated
First
Query
Second
Query
Last
Query
2,937,450
Successful
4,508,821
Unsuccessful
…
17.
18. Exploratory Pilot
0% 20% 40% 60% 80% 100%
142Agreement 33
0% 20% 40% 60% 80% 100%
973icked 6uccessful 9 36
Last
Query
Reasonable
proxy for succes
Between
three judges
Which task was
more successful?
Which task was
more successful?
19. Intent-based query
reformulation taxonomy
Added, removed
or substituted
☐ an action (e.g., download, contact)
☐ an attribute (e.g., printable, free, )
Specified ☐ a particular instance
(e.g. added a brand name or version number)
Rephrased ☐ Corrected a spelling error or typo
☐ Used a synonym or related term
Switched ☐ to a related task (changed main focus)
☐ to a new task
Based on the answers to the open-ended questions
20. Main Annotations:
Transitions in 659 successful tasks
• Annotated 659 new successful tasks
• Single tasks, no pairs
• Annotated transitions between each query pair
First
Query
Second
Query
Last
Query 2,937,450
Successful
Tasks
…
21.
22. Success & Pivotal Query
Q1 Q2 Q3 Q4 Q5 Q6 Q7
Pivotal Query
0
50
100
150
200
Numberoftasks
Continued
Final Query
19% 40% 39%
not at all somewhat mostly completely
Task Success
Made at least
some progress
62%
Location of the pivotal query
30. Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Substantial differences in how searchers refine queries
in different stages in a struggling task. Strong connections
with task outcomes, and particular pivotal queries.
Struggling searchers, successful or not, behave
differently on many search aspects, including:
queries, reformulations, clicks, dwell time & topic.
31. Predicting Reformulation
What query reformulation
strategy will a user employ?
Added Attribute
6SeFiIied IQstaQFe
6ubstituted Attribute
6witFhed tR 5eOated
5emRved Attribute
5eShrased w/6yQRQym
Added AFtiRQ
6witFhed tR 1ew
6ubstituted AFtiRQ
CRrreFted 7ySR
5emRved AFtiRQ
465
374
203
179
99
91
80
46
41
34
18
4uery 7raQsitiRQ
FrRm Iirst Tuery
2thers
7R IiQaO Tuery
First
Query
Second
Query
Tailor query
suggestions
Tailor auto-
completions
Re-rank results
or suggestions
32. Feature Sets
+History Features
Query Features
Interaction Features
Transition Features
First
Query
Second
Query
Tailor query
suggestions
Tailor auto-
completions
Re-rank results
or suggestions
34. Characterizing Struggling
Annotating Query Transitions
Predicting Future Actions
Outline
Accurately predicting reformulation strategy
Provide situation-specific support at a higher level
Substantial differences in how searchers behave
in a struggling task, depending on task outcomes.
35. Limitations
• Very specific and apparent type of struggling
• Determination of search success
• Final click as a proxy
• Judgements by third-party, not by searchers
themselves
36. Future Work
• Directly applying reformulation strategies
• Query suggestions & auto-completions
• Mining query ➣ pivotal query pairs
• Can identify automatically: F1 59%
(final query baseline: 51%)
• Hints and tips on reformulation strategies