SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
NILAVRA BHATTACHARYA, JACEK GWIZDKA
School of Information, The University of Texas at Austin
ACM SIGIR CHIIR 2019 • GLASGOW, SCOTLAND, UK? EU?
MEASURING
LEARNING DURING
SEARCH
Differences in Interactions, Eye-Gaze, and Semantic Similarity to
Expert Knowledge
Why is the
sky blue?
The sky is blue
because …
Big Idea: to measure this knowledge-change, and (eventually) infer when it is happening
Benefits: can be extended to a wide variety of fields, independent of topic and content
e.g. online learning environments will become more popular
“learning” or change in knowledge
1. Introduction & Background
2. Method
3. Measures
4. Results
5. Summary
Overview
4
1.1 What is Learning?
5
change in verbal knowledge,
from before to after a search session
Image: http://thepeakperformancecenter.com/educational-learning/thinking/blooms-taxonomy/blooms-taxonomy-revised
Revised Bloom’s Taxonomy
1.2 Measuring Learning
6
Existing Methods of Assessing of Knowledge-Change:
• asking explicit fact-checking questions
– can be disruptive for web-searching
• SVT: Sentence Verification Technique
– requires creating specific questions for each document consumed (Freund et al., 2016)
• (Automated) Essay Scoring
– requires training set of carefully hand-scored essays (Yang et al., 2002)
• concept-maps and mind-mapping
– difficult to score for non-experts
• common drawbacks: in the context of online information search
– topic specific
– time consuming to measure
– difficult to scale-up
1.3 Prior Work
7
• Goal: implicit measurement of learning or knowledge-gain
Implicit Measures:
• Cole et al. (2013): eye gaze patterns can assess differences in users’ domain
knowledge level (for text search).
– behavioural features are topic-independent predictive cues of domain knowledge
• Collins-Thompson et al. (2016): diversity in search queries is an indicator of
increased knowledge gain.
• Vakkari (2016): suggested a set of predictors for knowledge-change during search.
1.3.1 Prior Work
8
Image: Gadiraju, U., Yu, R., Dietze, S., & Holtz, P. (2018). Analyzing knowledge gain of users in informational search sessions on the web. CHIIR ’18
Gadiraju et al. (CHIIR 2018):
• topic specific pre- and post tests
involving True/False questionnaires
– may not be generalizable for all topics
– exposes users to search-topic and possible answers
– correct answer for multiple-choice questions can be
selected by guesswork
1.3.2 Prior Work
9
Image: Ghosh, S., Rath, M., & Shah, C. (2018). Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks. CHIIR ’18
Ghosh et al. (CHIIR 2018):
• users were asked to self-rate their perceived change in knowledge
– subjective
– may not reflect true change in knowledge
• explore knowledge-change measures that
– do not require domain-specific comprehension tests
– do not expose users to the search-topic before the actual search begins
– attempt to measure a searcher’s knowledge-change, minimizing guessing
and subjective differences
• investigate differences in search behaviour and gaze-patterns of users
showing low versus high knowledge-change
1.4 Research Aims
10
1. Introduction & Background
2. Method
3. Measures
4. Results
5. Summary
Overview
11
• Eye-tracking user study (n=30; 16 females)
• Within subjects design
• Searched for health-related information on the web
• participants were pre-screened for
- non-expert topic familiarity
- uncorrected eye-sight
- proficiency in online searching
2.1 Experimental Design
12
• Two search tasks, on health related topics, simulating work-task
approach (Borlund, 2003)
– tried to trigger realistic information-need in participants
(e.g., helping a cousin, and a friend)
• Topics:
– Vitamin A
– Hypotension
• Each task had 4 questions from multiple facets
– e.g. for Vitamin A, participants had to find:
• recommended dosage
• health benefits
• consequences of excess and deficiency
• food sources
2.2 Task Description
13
Borlund, P. (2003). The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Information Research. 8(3).
Section 3.2 of our paper contains
the full-texts of the task prompts.
Memory-
span
test
via
Working
Memory
Capacity
WMC
Online
health
literacy
test
via
eHealth
Literacy
Scale
eHEALS
Training
Task
to
familiarize
with
interface
Search
task
(in
counter-
balanced
order)
2.3 Procedure
14
2.3 Procedure
15
a
b
c
e
d
Pre-task Knowledge (free-text)
Customized Google SERP
CONTENT pages
Bookmarking
Note-taking
Post-task
Knowledge
(free-text)
• Custom Google SERP:
– result retrieved in
background from Google
– 7 results per page
• increases font-size and
visual angle for proper eye-
tracking
– no ads
2.3 Custom Google SERP
16
2.3 Bookmarking & Note-taking
17
Bookmarking Note-taking
Memory-
span
test
via
Working
Memory
Capacity
WMC
Online
health
literacy
test
via
eHealth
Literacy
Scale
eHEALS
Training
Task
to
familiarize
with
interface
Search
task
(in
counter-
balanced
order)
Perceived
workload
test after
each task
via
NASA-TLX
2.4 Procedure
18
1. Introduction & Background
2. Method
3. Measures
4. Results
5. Summary
Overview
19
• Pre- and post-tasks
3 Measures
20
Think of what you already
know on the topic of this
search and list as many
phrases or words as you can
that come to your mind.
Now that you have completed
this search task, think of the
information that you found
and list as many words or
phrases as you can on the
topic of the search task.
change in knowledge
Aim: to measure this change, using implicit-feedback measures
Challenge: user input is open-ended text, via free-recall from memory (no time-limit)
(Key difference of our study from prior works (Gadiraju et al., 2018; Yu et al., 2018; Ghosh et al., 2018))
• Knowledge Change (KC)
– simple
– sophisticated (using semantic similarity)
• Eye-tracking (ET)
• Search Interactions (SI)
• Unit of analysis: <user, task> pair
3 Measures
21
3.1.1 KC Measures - Simple
22
𝐾𝐶_𝑆𝑖𝑚𝑝𝑙𝑒 =
𝑖𝑡𝑒𝑚𝑠 𝑝𝑜𝑠𝑡 − 𝑖𝑡𝑒𝑚𝑠 𝑝𝑟𝑒
𝑖𝑡𝑒𝑚𝑠 𝑝𝑜𝑠𝑡
items = words and phrases entered by users before and after
each task, separated by ENTER key presses (“n”)
3.1.2 KC Measures - Sophisticated
23
Expert
Knowledge
(or “Correct” Answers)
User’s
Pre-task
answers
User’s
Post-task
answers
knowledge
change
3.1.2 KC Measures - Sophisticated
24
Step 1: Curating expert knowledge vocabulary:
– crowdsourced answers to each question from the search task (MTurk)
. . .
– answers were cleaned and verified by a medical doctor (expert)
– final vocabulary size:
• 115 phrases / words for Task 1
• 105 phrases / words for Task 2
3.1.2 KC Measures - Sophisticated
25
Step 2: Measuring semantic similarity between texts
Step 2(a): Turn natural text into numbers:
"user's pre-task answers"
"user's post-task answers"
"answers from expert"
[0.3, 5.6, 0.7, …]
[0.7, 1.2, 0.1, …]
[0.9, 3.6, 0.5, …]
Sentence
Embedding
Image: https://tfhub.dev/google/universal-sentence-encoder/2
• encoder of greater-than-word length text
phrases, sentences, short paragraphs
• trained on a variety of large text-corpuses
Google News, entire English Wikipedia, etc.
3.1.2 KC Measures - Sophisticated
26
Step 2: Measuring semantic similarity between texts
Step 2(b): Measure distance between vectors:
[0.9, 3.6, 0.5, …] Ԧ𝑣
expert’s knowledge vector
= 1 − arccos ൗ
𝑢 ⋅ Ԧ𝑣
‖𝑢‖‖ Ԧ𝑣‖
𝜋
3.1.2 KC Measures - Sophisticated
27
Expert
Knowledge
(or “Correct” Answers)
User’s
Pre-task
answers
User’s
Post-task
answers
knowledge
change
final knowledge state
initial knowledge state
3.1.2 KC Measures - Sophisticated
28
𝐾𝐶_𝑆𝑒𝑚_𝑫𝒊𝒇𝒇 = sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 − sim 𝒑𝒓𝒆_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡
𝐾𝐶_𝑆𝑒𝑚_𝑹𝒂𝒕𝒊𝒐 =
sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡
sim 𝒑𝒓𝒆_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡
• How to measure change between two numbers?
• we used “reading” eye-fixations only
– fixation count (𝑓𝑖𝑥_𝑛) and duration (𝑓𝑖𝑥_𝑑𝑢𝑟_𝑠𝑢𝑚, 𝑓𝑖𝑥_𝑑𝑢𝑟_𝑎𝑣𝑔)
– length of reading sequences (𝑟𝑠𝑒𝑞_𝑙𝑒𝑛)
– regression count (𝑟𝑒𝑔𝑟_𝑛) and length (𝑟𝑒𝑔𝑟_𝑙𝑒𝑛)
3.2 Eye-tracking (ET)
29
(Gwizdka, 2014)
Reading
(on relevant content)
Scanning
(on irrelevant content)
Gwizdka, J. (2014). Characterizing Relevance with Eye-tracking Measures. IIiX ’14
• Webpage based:
– count of pages visited (𝑝𝑔_𝑛)
• Search-query based:
– count of queries (𝑞𝑢𝑒𝑟𝑦_𝑛)
– count of new queries in query-reformulations (𝑞𝑟_𝑛𝑒𝑤_𝑛)
– how “specialized” were the words used in queries (𝑞_𝑤𝑜𝑟𝑑_𝑓𝑟𝑒𝑞)
• "cure for low blood pressure" (less specialized)
• "mayoclinic hypotension treatment" (more specialized)
• Table 1 in our paper describes how to compute all the measures.
3.3 Search Interactions (SI)
30
1. Introduction & Background
2. Method
3. Measures
4. Results
5. Summary
Overview
31
4.1 Data Analysis
32
𝐾𝐶_𝑆𝑖𝑚𝑝𝑙𝑒
𝐾𝐶_𝑆𝑒𝑚_𝑅𝑎𝑡𝑖𝑜
𝐾𝐶_𝑆𝑒𝑚_𝐷𝑖𝑓𝑓
LO group1
LO group2
LO group3
HI group1
HI group2
HI group3
ET SI
ET SI
ET SI ET SI
ET SI
ET SI
Do LO and HI groups differ
significantly in terms of their
Eye-tracking (ET) and Search
Interaction (SI) measures?
• Quasi-independent Vars:
– Knowledge Change (KC)
groups (LO and HI)
• Dependent Vars:
– Eye-tracking (ET)
– Search Interactions (SI)
• Statistical Test:
– Mann Whitney UGroup-membership was fairly consistent:
- 2 / 49 mismatches between _Ratio and _Diff
- 9 / 49 mismatches between _Simple and _Sem
4.2.1 ET Measures - Fixations
33
• LO group had higher (!) eye-tracking fixation-measures than HI group:
– fixated more on CONTENT pages (fix_n_content_avg .05 ≤ p ≤ .1)
– fixated longer in total (fix_dur_content_sum p < .01) and on average (fix_n_content_avg)
• Yu et al. (SIGIR 2018) similarly found:
– total, average, and max time spent on webpages have highest predictive power for
knowledge-gain prediction
4.2.1 ET Measures - Movement
34
• Again, LO group differed significantly by having:
– longer reading sequences (rseq_n); higher probability of reading (pRR_serp)
– regressed backwards longer (regr_len), and more often (regr_n)
• Eye-tracking measures show LO group put more effort in reading, yet our Knowledge-Change
measures reflect they learnt less
4.2.2 SI Measures
35
• LO and HI users entered similar number of search queries
– LO group entered fewer new queries in reformulations (qr_new_n)
– LO group used more common (or less specialized) words in queries (q_words_freq)
• Yu et al. (SIGIR 2018) similarly observed:
– count of unique terms in queries was the only query-related feature that showed
predictive power
4.3 Other Measures
36
• HI group reported higher mental workload (NASA_TLX)
• LO and HI groups did not have any significant differences in
– eHealth literacy knowledge, comfort, and skills at finding, evaluating, and applying electronic health information
– working-memory capacity
– number of webpages visited
• Yu et al. (SIGIR 2018) similarly illustrate:
– counts of webpages visited are very weak predictors of knowledge-gain (Fig 1 of Yu et
al. (2018): feature importance of random forest model).
• LO-FKS group:
– spent longer time on reading SERPs (pRR_serp)
– opened fewer CONTENT pages (pg_content_n);
thus found fewer relevant CONTENT pages (pg_content_rel_n)
• similar phenomenon observed by Gwizdka (CHIIR 2017) and Collins-Thompson
et al. (2016)
– reported lower mental workload after task (NASA_TLX)
4.4 Final Knowledge State (FKS)
37
Expert
Knowledge
Post-task
answers
sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡
𝑝𝑜𝑠𝑡_𝑒𝑥𝑝_𝑠𝑖𝑚
LO-FKS HI-FKS
ET SI ET SI
final knowledge state (FKS)
1. Introduction & Background
2. Method
3. Measures
4. Results
5. Summary
Overview
38
• LO group read more, yet they learnt less
– possibly due to difficulty in acquiring information
• LO-FKS group spent more time in reading SERPs
– yet they opened fewer relevant search results
• LO group used less specialized words in their queries
• LO group reported lower mental workload after each task
• No significant differences in
– total number of pages visited
– eHealth Literacy Score
– Working Memory Capacity
5.1 Takeaways
39
GROUPS:
LO: Low Knowledge-Change (KC)
LO-FKS: Low Final-Knowledge-State (FKS)
• explore knowledge-change measures that
– do not require domain-specific comprehension tests
– do not expose users to the search-topic before the actual search begins
• we introduce a topic-independent, free-recall based method of knowledge assessment
– expert vocabulary can be curated from online knowledgebases (e.g. Wikipedia)
– attempt to measure a searcher’s knowledge-change, while minimizing guessing
and subjective differences
• we used semantic similarity of user-responses to expert-knowledge to measure
knowledge-change
– advances in measuring semantic-similarity will help in this direction
• investigate differences in search behaviour and gaze-patterns of users
showing low versus high knowledge-change
– results show Eye-tracking (ET) and Search-Interaction (SI) measures sig. differ with varying
levels of knowledge-change => ET & SI: good candidate measures of verbal-learning
5.2 In terms of Research Aims
40
5.3 Limitations & Future Work
41
• Limitations:
– only 2 search-tasks, of similar nature (health information search)
– data-analysis at task-level (not participant level)
– relatively uniform group of participants (young-adult college students)
– short time-frame
• Future Directions:
– wider range of search tasks
– more diverse participants
– additional individual-difference tests
– multiple-session study (to assess knowledge-change over longer period of time)
5.4 Summary
42
Verbal
Knowledge
Change
Specialized
words in
queries
NASA TLX
mental
workload
Eye-
tracking
measures
Search
interactions
webpage counts,
durations
Working
Memory
Capacity
eHealth
Literacy
Score
THANK YOU Questions?
Student Travel GrantCareer Award
Acknowledgements:
expert-knowledge curation
Dr. Andrzej Kahl
crowdsourcing and data collection
Yinglong Zhang
• Collins-Thompson, K., Rieh, S. Y., Haynes, C. C., & Syed, R. (2016). Assessing learning outcomes in web search:
A comparison of tasks and query strategies. CHIIR ’16
• Gadiraju, U., Yu, R., Dietze, S., & Holtz, P. (2018). Analyzing knowledge gain of users in informational search
sessions on the web. CHIIR ’18
• Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., & Dietze, S. (2018). Predicting user knowledge gain in
informational search sessions. SIGIR ‘18
• Ghosh, S., Rath, M., & Shah, C. (2018). Searching as Learning: Exploring Search Behavior and Learning
Outcomes in Learning-related Tasks. CHIIR ’18
• Gwizdka, J. (2014). Characterizing Relevance with Eye-tracking Measures. IIiX ’14
• Cole, M. J., Gwizdka, J., Liu, C., Belkin, N. J., & Zhang, X. (2013). Inferring user knowledge level from eye
movement patterns. Information Processing & Management, 49(5), 1075-1091.
• Gwizdka, J. (2017, March). I can and so I search more: effects of memory span on search behavior. CHIIR ’17
45
References
• Borlund, P. (2003). The IIR evaluation model: a framework for evaluation of interactive information retrieval
systems. Information Research. 8(3).
• Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the
American Society for Information Science and Technology, 55(3), 246-258.
• Vakkari, P. (2016). Searching as learning: A systematization based on literature. Journal of Information Science,
42(1), 7-18.
• Cer, D., Yang, Y., Kong, S. Y., Hua, N., Limtiaco, N., John, R. S., ... & Sung, Y. H. (2018). Universal sentence
encoder. arXiv preprint arXiv:1803.11175.
• Franz, A., & Brants, T. (2006). All our n-gram are belong to you. Google Machine Translation Team, 20.
• Freund, L., Kopak, R., & O’Brien, H. (2016). The effects of textual environment on reading comprehension:
Implications for searching as learning. Journal of Information Science, 42(1), 79-93.
• Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating
computer-automated scoring. Applied Measurement in Education, 15(4), 391-412.
• Francis, G., MacKewn, A., & Goldthwaite, D. (2004). CogLab on a CD. Wadsworth Publishing Company.
46
References

Mais conteúdo relacionado

Mais procurados

When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learningDevansh16
 
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...IRJET Journal
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slidestoolboc
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET Journal
 
Applying supervised and un supervised learning approaches for movie recommend...
Applying supervised and un supervised learning approaches for movie recommend...Applying supervised and un supervised learning approaches for movie recommend...
Applying supervised and un supervised learning approaches for movie recommend...IAEME Publication
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsriyaniaes
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsAhmad Jawdat
 
My PhD thesis presentation slides
My PhD thesis presentation slidesMy PhD thesis presentation slides
My PhD thesis presentation slidesMattia Bosio
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
 
Mobile App Recommendations Using Deep Learning and Big Data
Mobile App Recommendations Using Deep Learning and Big DataMobile App Recommendations Using Deep Learning and Big Data
Mobile App Recommendations Using Deep Learning and Big DataLuís Pinto
 

Mais procurados (18)

De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learning
 
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...
IRJET- User Behavior Analysis on Social Media Data using Sentiment Analysis o...
 
Research proposal
Research proposalResearch proposal
Research proposal
 
50120130405018
5012013040501850120130405018
50120130405018
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slides
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
 
Kenett on info q and pse
Kenett on info q and pseKenett on info q and pse
Kenett on info q and pse
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Applying supervised and un supervised learning approaches for movie recommend...
Applying supervised and un supervised learning approaches for movie recommend...Applying supervised and un supervised learning approaches for movie recommend...
Applying supervised and un supervised learning approaches for movie recommend...
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
My PhD thesis presentation slides
My PhD thesis presentation slidesMy PhD thesis presentation slides
My PhD thesis presentation slides
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
 
Mobile App Recommendations Using Deep Learning and Big Data
Mobile App Recommendations Using Deep Learning and Big DataMobile App Recommendations Using Deep Learning and Big Data
Mobile App Recommendations Using Deep Learning and Big Data
 

Semelhante a Measuring Learning During Search - ACM SIGIR CHIIR 2019

Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchCaroline Jarrett
 
How to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific surveyHow to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific surveyNermin Osman
 
Tool development presentation
Tool development presentationTool development presentation
Tool development presentationSyed imran ali
 
Ms 66 marketing research
Ms 66 marketing researchMs 66 marketing research
Ms 66 marketing researchsmumbahelp
 
DEVELOPMENT of Research Tool Power Point.pptx
DEVELOPMENT of Research Tool Power Point.pptxDEVELOPMENT of Research Tool Power Point.pptx
DEVELOPMENT of Research Tool Power Point.pptxssuserabcb18
 
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...GESIS
 
Expert-System for Health Promotion
Expert-System for Health PromotionExpert-System for Health Promotion
Expert-System for Health PromotionJoel Bennett
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
 
hall marks of sr
 hall marks of sr hall marks of sr
hall marks of srPeaceMaker
 
How search engine marketing influences user knowledge gain: Development and e...
How search engine marketing influences user knowledge gain: Development and e...How search engine marketing influences user knowledge gain: Development and e...
How search engine marketing influences user knowledge gain: Development and e...Sebastian Schultheiß
 
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docx
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docxWK 2 DQ 1Read the journal article The Ethics of Internet Resear.docx
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docxambersalomon88660
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State UniversityJohn Whitmer, Ed.D.
 
Lecture5.pdf
Lecture5.pdfLecture5.pdf
Lecture5.pdfTake1As
 

Semelhante a Measuring Learning During Search - ACM SIGIR CHIIR 2019 (20)

Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey Research
 
How to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific surveyHow to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific survey
 
Tool development presentation
Tool development presentationTool development presentation
Tool development presentation
 
Ms 66 marketing research
Ms 66 marketing researchMs 66 marketing research
Ms 66 marketing research
 
DEVELOPMENT of Research Tool Power Point.pptx
DEVELOPMENT of Research Tool Power Point.pptxDEVELOPMENT of Research Tool Power Point.pptx
DEVELOPMENT of Research Tool Power Point.pptx
 
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
 
Expert-System for Health Promotion
Expert-System for Health PromotionExpert-System for Health Promotion
Expert-System for Health Promotion
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement
 
Data Collection in Quantitative Research
Data Collection in Quantitative ResearchData Collection in Quantitative Research
Data Collection in Quantitative Research
 
16497 mgt 252
16497 mgt 25216497 mgt 252
16497 mgt 252
 
hall marks of sr
 hall marks of sr hall marks of sr
hall marks of sr
 
How search engine marketing influences user knowledge gain: Development and e...
How search engine marketing influences user knowledge gain: Development and e...How search engine marketing influences user knowledge gain: Development and e...
How search engine marketing influences user knowledge gain: Development and e...
 
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docx
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docxWK 2 DQ 1Read the journal article The Ethics of Internet Resear.docx
WK 2 DQ 1Read the journal article The Ethics of Internet Resear.docx
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State University
 
Action research 2013 (2)
Action research 2013 (2)Action research 2013 (2)
Action research 2013 (2)
 
Smart aging-ibm-talk
Smart aging-ibm-talkSmart aging-ibm-talk
Smart aging-ibm-talk
 
Smart aging-ibm-talk
Smart aging-ibm-talkSmart aging-ibm-talk
Smart aging-ibm-talk
 
Lecture5.pdf
Lecture5.pdfLecture5.pdf
Lecture5.pdf
 
Treatment integrity DADD CEC
Treatment integrity DADD CECTreatment integrity DADD CEC
Treatment integrity DADD CEC
 
Unit-1.pdf
Unit-1.pdfUnit-1.pdf
Unit-1.pdf
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Measuring Learning During Search - ACM SIGIR CHIIR 2019

  • 1. NILAVRA BHATTACHARYA, JACEK GWIZDKA School of Information, The University of Texas at Austin ACM SIGIR CHIIR 2019 • GLASGOW, SCOTLAND, UK? EU? MEASURING LEARNING DURING SEARCH Differences in Interactions, Eye-Gaze, and Semantic Similarity to Expert Knowledge
  • 2. Why is the sky blue? The sky is blue because … Big Idea: to measure this knowledge-change, and (eventually) infer when it is happening Benefits: can be extended to a wide variety of fields, independent of topic and content e.g. online learning environments will become more popular “learning” or change in knowledge
  • 3. 1. Introduction & Background 2. Method 3. Measures 4. Results 5. Summary Overview 4
  • 4. 1.1 What is Learning? 5 change in verbal knowledge, from before to after a search session Image: http://thepeakperformancecenter.com/educational-learning/thinking/blooms-taxonomy/blooms-taxonomy-revised Revised Bloom’s Taxonomy
  • 5. 1.2 Measuring Learning 6 Existing Methods of Assessing of Knowledge-Change: • asking explicit fact-checking questions – can be disruptive for web-searching • SVT: Sentence Verification Technique – requires creating specific questions for each document consumed (Freund et al., 2016) • (Automated) Essay Scoring – requires training set of carefully hand-scored essays (Yang et al., 2002) • concept-maps and mind-mapping – difficult to score for non-experts • common drawbacks: in the context of online information search – topic specific – time consuming to measure – difficult to scale-up
  • 6. 1.3 Prior Work 7 • Goal: implicit measurement of learning or knowledge-gain Implicit Measures: • Cole et al. (2013): eye gaze patterns can assess differences in users’ domain knowledge level (for text search). – behavioural features are topic-independent predictive cues of domain knowledge • Collins-Thompson et al. (2016): diversity in search queries is an indicator of increased knowledge gain. • Vakkari (2016): suggested a set of predictors for knowledge-change during search.
  • 7. 1.3.1 Prior Work 8 Image: Gadiraju, U., Yu, R., Dietze, S., & Holtz, P. (2018). Analyzing knowledge gain of users in informational search sessions on the web. CHIIR ’18 Gadiraju et al. (CHIIR 2018): • topic specific pre- and post tests involving True/False questionnaires – may not be generalizable for all topics – exposes users to search-topic and possible answers – correct answer for multiple-choice questions can be selected by guesswork
  • 8. 1.3.2 Prior Work 9 Image: Ghosh, S., Rath, M., & Shah, C. (2018). Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks. CHIIR ’18 Ghosh et al. (CHIIR 2018): • users were asked to self-rate their perceived change in knowledge – subjective – may not reflect true change in knowledge
  • 9. • explore knowledge-change measures that – do not require domain-specific comprehension tests – do not expose users to the search-topic before the actual search begins – attempt to measure a searcher’s knowledge-change, minimizing guessing and subjective differences • investigate differences in search behaviour and gaze-patterns of users showing low versus high knowledge-change 1.4 Research Aims 10
  • 10. 1. Introduction & Background 2. Method 3. Measures 4. Results 5. Summary Overview 11
  • 11. • Eye-tracking user study (n=30; 16 females) • Within subjects design • Searched for health-related information on the web • participants were pre-screened for - non-expert topic familiarity - uncorrected eye-sight - proficiency in online searching 2.1 Experimental Design 12
  • 12. • Two search tasks, on health related topics, simulating work-task approach (Borlund, 2003) – tried to trigger realistic information-need in participants (e.g., helping a cousin, and a friend) • Topics: – Vitamin A – Hypotension • Each task had 4 questions from multiple facets – e.g. for Vitamin A, participants had to find: • recommended dosage • health benefits • consequences of excess and deficiency • food sources 2.2 Task Description 13 Borlund, P. (2003). The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Information Research. 8(3). Section 3.2 of our paper contains the full-texts of the task prompts.
  • 14. 2.3 Procedure 15 a b c e d Pre-task Knowledge (free-text) Customized Google SERP CONTENT pages Bookmarking Note-taking Post-task Knowledge (free-text)
  • 15. • Custom Google SERP: – result retrieved in background from Google – 7 results per page • increases font-size and visual angle for proper eye- tracking – no ads 2.3 Custom Google SERP 16
  • 16. 2.3 Bookmarking & Note-taking 17 Bookmarking Note-taking
  • 18. 1. Introduction & Background 2. Method 3. Measures 4. Results 5. Summary Overview 19
  • 19. • Pre- and post-tasks 3 Measures 20 Think of what you already know on the topic of this search and list as many phrases or words as you can that come to your mind. Now that you have completed this search task, think of the information that you found and list as many words or phrases as you can on the topic of the search task. change in knowledge Aim: to measure this change, using implicit-feedback measures Challenge: user input is open-ended text, via free-recall from memory (no time-limit) (Key difference of our study from prior works (Gadiraju et al., 2018; Yu et al., 2018; Ghosh et al., 2018))
  • 20. • Knowledge Change (KC) – simple – sophisticated (using semantic similarity) • Eye-tracking (ET) • Search Interactions (SI) • Unit of analysis: <user, task> pair 3 Measures 21
  • 21. 3.1.1 KC Measures - Simple 22 𝐾𝐶_𝑆𝑖𝑚𝑝𝑙𝑒 = 𝑖𝑡𝑒𝑚𝑠 𝑝𝑜𝑠𝑡 − 𝑖𝑡𝑒𝑚𝑠 𝑝𝑟𝑒 𝑖𝑡𝑒𝑚𝑠 𝑝𝑜𝑠𝑡 items = words and phrases entered by users before and after each task, separated by ENTER key presses (“n”)
  • 22. 3.1.2 KC Measures - Sophisticated 23 Expert Knowledge (or “Correct” Answers) User’s Pre-task answers User’s Post-task answers knowledge change
  • 23. 3.1.2 KC Measures - Sophisticated 24 Step 1: Curating expert knowledge vocabulary: – crowdsourced answers to each question from the search task (MTurk) . . . – answers were cleaned and verified by a medical doctor (expert) – final vocabulary size: • 115 phrases / words for Task 1 • 105 phrases / words for Task 2
  • 24. 3.1.2 KC Measures - Sophisticated 25 Step 2: Measuring semantic similarity between texts Step 2(a): Turn natural text into numbers: "user's pre-task answers" "user's post-task answers" "answers from expert" [0.3, 5.6, 0.7, …] [0.7, 1.2, 0.1, …] [0.9, 3.6, 0.5, …] Sentence Embedding Image: https://tfhub.dev/google/universal-sentence-encoder/2 • encoder of greater-than-word length text phrases, sentences, short paragraphs • trained on a variety of large text-corpuses Google News, entire English Wikipedia, etc.
  • 25. 3.1.2 KC Measures - Sophisticated 26 Step 2: Measuring semantic similarity between texts Step 2(b): Measure distance between vectors: [0.9, 3.6, 0.5, …] Ԧ𝑣 expert’s knowledge vector = 1 − arccos ൗ 𝑢 ⋅ Ԧ𝑣 ‖𝑢‖‖ Ԧ𝑣‖ 𝜋
  • 26. 3.1.2 KC Measures - Sophisticated 27 Expert Knowledge (or “Correct” Answers) User’s Pre-task answers User’s Post-task answers knowledge change final knowledge state initial knowledge state
  • 27. 3.1.2 KC Measures - Sophisticated 28 𝐾𝐶_𝑆𝑒𝑚_𝑫𝒊𝒇𝒇 = sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 − sim 𝒑𝒓𝒆_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 𝐾𝐶_𝑆𝑒𝑚_𝑹𝒂𝒕𝒊𝒐 = sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 sim 𝒑𝒓𝒆_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 • How to measure change between two numbers?
  • 28. • we used “reading” eye-fixations only – fixation count (𝑓𝑖𝑥_𝑛) and duration (𝑓𝑖𝑥_𝑑𝑢𝑟_𝑠𝑢𝑚, 𝑓𝑖𝑥_𝑑𝑢𝑟_𝑎𝑣𝑔) – length of reading sequences (𝑟𝑠𝑒𝑞_𝑙𝑒𝑛) – regression count (𝑟𝑒𝑔𝑟_𝑛) and length (𝑟𝑒𝑔𝑟_𝑙𝑒𝑛) 3.2 Eye-tracking (ET) 29 (Gwizdka, 2014) Reading (on relevant content) Scanning (on irrelevant content) Gwizdka, J. (2014). Characterizing Relevance with Eye-tracking Measures. IIiX ’14
  • 29. • Webpage based: – count of pages visited (𝑝𝑔_𝑛) • Search-query based: – count of queries (𝑞𝑢𝑒𝑟𝑦_𝑛) – count of new queries in query-reformulations (𝑞𝑟_𝑛𝑒𝑤_𝑛) – how “specialized” were the words used in queries (𝑞_𝑤𝑜𝑟𝑑_𝑓𝑟𝑒𝑞) • "cure for low blood pressure" (less specialized) • "mayoclinic hypotension treatment" (more specialized) • Table 1 in our paper describes how to compute all the measures. 3.3 Search Interactions (SI) 30
  • 30. 1. Introduction & Background 2. Method 3. Measures 4. Results 5. Summary Overview 31
  • 31. 4.1 Data Analysis 32 𝐾𝐶_𝑆𝑖𝑚𝑝𝑙𝑒 𝐾𝐶_𝑆𝑒𝑚_𝑅𝑎𝑡𝑖𝑜 𝐾𝐶_𝑆𝑒𝑚_𝐷𝑖𝑓𝑓 LO group1 LO group2 LO group3 HI group1 HI group2 HI group3 ET SI ET SI ET SI ET SI ET SI ET SI Do LO and HI groups differ significantly in terms of their Eye-tracking (ET) and Search Interaction (SI) measures? • Quasi-independent Vars: – Knowledge Change (KC) groups (LO and HI) • Dependent Vars: – Eye-tracking (ET) – Search Interactions (SI) • Statistical Test: – Mann Whitney UGroup-membership was fairly consistent: - 2 / 49 mismatches between _Ratio and _Diff - 9 / 49 mismatches between _Simple and _Sem
  • 32. 4.2.1 ET Measures - Fixations 33 • LO group had higher (!) eye-tracking fixation-measures than HI group: – fixated more on CONTENT pages (fix_n_content_avg .05 ≤ p ≤ .1) – fixated longer in total (fix_dur_content_sum p < .01) and on average (fix_n_content_avg) • Yu et al. (SIGIR 2018) similarly found: – total, average, and max time spent on webpages have highest predictive power for knowledge-gain prediction
  • 33. 4.2.1 ET Measures - Movement 34 • Again, LO group differed significantly by having: – longer reading sequences (rseq_n); higher probability of reading (pRR_serp) – regressed backwards longer (regr_len), and more often (regr_n) • Eye-tracking measures show LO group put more effort in reading, yet our Knowledge-Change measures reflect they learnt less
  • 34. 4.2.2 SI Measures 35 • LO and HI users entered similar number of search queries – LO group entered fewer new queries in reformulations (qr_new_n) – LO group used more common (or less specialized) words in queries (q_words_freq) • Yu et al. (SIGIR 2018) similarly observed: – count of unique terms in queries was the only query-related feature that showed predictive power
  • 35. 4.3 Other Measures 36 • HI group reported higher mental workload (NASA_TLX) • LO and HI groups did not have any significant differences in – eHealth literacy knowledge, comfort, and skills at finding, evaluating, and applying electronic health information – working-memory capacity – number of webpages visited • Yu et al. (SIGIR 2018) similarly illustrate: – counts of webpages visited are very weak predictors of knowledge-gain (Fig 1 of Yu et al. (2018): feature importance of random forest model).
  • 36. • LO-FKS group: – spent longer time on reading SERPs (pRR_serp) – opened fewer CONTENT pages (pg_content_n); thus found fewer relevant CONTENT pages (pg_content_rel_n) • similar phenomenon observed by Gwizdka (CHIIR 2017) and Collins-Thompson et al. (2016) – reported lower mental workload after task (NASA_TLX) 4.4 Final Knowledge State (FKS) 37 Expert Knowledge Post-task answers sim 𝒑𝒐𝒔𝒕_𝑡𝑎𝑠𝑘, 𝑒𝑥𝑝𝑒𝑟𝑡 𝑝𝑜𝑠𝑡_𝑒𝑥𝑝_𝑠𝑖𝑚 LO-FKS HI-FKS ET SI ET SI final knowledge state (FKS)
  • 37. 1. Introduction & Background 2. Method 3. Measures 4. Results 5. Summary Overview 38
  • 38. • LO group read more, yet they learnt less – possibly due to difficulty in acquiring information • LO-FKS group spent more time in reading SERPs – yet they opened fewer relevant search results • LO group used less specialized words in their queries • LO group reported lower mental workload after each task • No significant differences in – total number of pages visited – eHealth Literacy Score – Working Memory Capacity 5.1 Takeaways 39 GROUPS: LO: Low Knowledge-Change (KC) LO-FKS: Low Final-Knowledge-State (FKS)
  • 39. • explore knowledge-change measures that – do not require domain-specific comprehension tests – do not expose users to the search-topic before the actual search begins • we introduce a topic-independent, free-recall based method of knowledge assessment – expert vocabulary can be curated from online knowledgebases (e.g. Wikipedia) – attempt to measure a searcher’s knowledge-change, while minimizing guessing and subjective differences • we used semantic similarity of user-responses to expert-knowledge to measure knowledge-change – advances in measuring semantic-similarity will help in this direction • investigate differences in search behaviour and gaze-patterns of users showing low versus high knowledge-change – results show Eye-tracking (ET) and Search-Interaction (SI) measures sig. differ with varying levels of knowledge-change => ET & SI: good candidate measures of verbal-learning 5.2 In terms of Research Aims 40
  • 40. 5.3 Limitations & Future Work 41 • Limitations: – only 2 search-tasks, of similar nature (health information search) – data-analysis at task-level (not participant level) – relatively uniform group of participants (young-adult college students) – short time-frame • Future Directions: – wider range of search tasks – more diverse participants – additional individual-difference tests – multiple-session study (to assess knowledge-change over longer period of time)
  • 41. 5.4 Summary 42 Verbal Knowledge Change Specialized words in queries NASA TLX mental workload Eye- tracking measures Search interactions webpage counts, durations Working Memory Capacity eHealth Literacy Score
  • 42. THANK YOU Questions? Student Travel GrantCareer Award Acknowledgements: expert-knowledge curation Dr. Andrzej Kahl crowdsourcing and data collection Yinglong Zhang
  • 43. • Collins-Thompson, K., Rieh, S. Y., Haynes, C. C., & Syed, R. (2016). Assessing learning outcomes in web search: A comparison of tasks and query strategies. CHIIR ’16 • Gadiraju, U., Yu, R., Dietze, S., & Holtz, P. (2018). Analyzing knowledge gain of users in informational search sessions on the web. CHIIR ’18 • Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., & Dietze, S. (2018). Predicting user knowledge gain in informational search sessions. SIGIR ‘18 • Ghosh, S., Rath, M., & Shah, C. (2018). Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks. CHIIR ’18 • Gwizdka, J. (2014). Characterizing Relevance with Eye-tracking Measures. IIiX ’14 • Cole, M. J., Gwizdka, J., Liu, C., Belkin, N. J., & Zhang, X. (2013). Inferring user knowledge level from eye movement patterns. Information Processing & Management, 49(5), 1075-1091. • Gwizdka, J. (2017, March). I can and so I search more: effects of memory span on search behavior. CHIIR ’17 45 References
  • 44. • Borlund, P. (2003). The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Information Research. 8(3). • Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the American Society for Information Science and Technology, 55(3), 246-258. • Vakkari, P. (2016). Searching as learning: A systematization based on literature. Journal of Information Science, 42(1), 7-18. • Cer, D., Yang, Y., Kong, S. Y., Hua, N., Limtiaco, N., John, R. S., ... & Sung, Y. H. (2018). Universal sentence encoder. arXiv preprint arXiv:1803.11175. • Franz, A., & Brants, T. (2006). All our n-gram are belong to you. Google Machine Translation Team, 20. • Freund, L., Kopak, R., & O’Brien, H. (2016). The effects of textual environment on reading comprehension: Implications for searching as learning. Journal of Information Science, 42(1), 79-93. • Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391-412. • Francis, G., MacKewn, A., & Goldthwaite, D. (2004). CogLab on a CD. Wadsworth Publishing Company. 46 References