SlideShare uma empresa Scribd logo
1 de 36
Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood  on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
Diane Kelly Associate Professor, School of Library and Information Science, UNC Chapel Hill ,[object Object]
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
Primary Aim of Research “to investigate the  relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
Methods 7
Studies 1 and 2 :  effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
Study 3 – Operationalized Precision at n 15
Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
Results Relevance Assessments 18
Did users’ relevance judgments agree with baseline assessments? 19
Did users’ relevance judgments agree with baseline assessments? 20
Did the topic affect differences in relevance assessments? 21
How much did relevance assessments vary between documents? 22
Results Evaluations of  System Performance 23
Did participants modify evaluation ratings? 24
Participant ratings compared between performance levels and studies 25
Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
What are the differences between study 1 and study 2? Intended difference:  Completion time? 28
What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
Are user-experienced precision values correlated with user ratings of system performance? 31
Are user-experienced precision values correlated with user ratings of system performance? 32

Mais conteúdo relacionado

Mais procurados

9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic ReviewResearchGuru
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmRobyn
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Rizwan S A
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsYTH
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative studyu070536
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 pptShiyuLi0903
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studiesu069072
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Rizwan S A
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal ValidityRiya Jain
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Marco Aurelio Gerosa
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Conferenceproceedings
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained360dissertations
 

Mais procurados (19)

2. Research Process
2. Research Process2. Research Process
2. Research Process
 
Experimental research
Experimental researchExperimental research
Experimental research
 
9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health Interventions
 
Trln
TrlnTrln
Trln
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 ppt
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studies
 
meta analysis
meta analysis meta analysis
meta analysis
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal Validity
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
 
Tufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd AfcTufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd Afc
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2
 

Destaque

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lostmegmeg42
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementJyotpreet Kaur
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...azayfert
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidiskristamahl
 
Communal helpers
Communal helpersCommunal helpers
Communal helperskvilberg
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'kvilberg
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาkruthai40
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesShubhranshu Upadhyay
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankIDES Editor
 

Destaque (11)

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
 
Intro to memtech java
Intro to memtech javaIntro to memtech java
Intro to memtech java
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามา
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
 
Dip fingerprint
Dip fingerprintDip fingerprint
Dip fingerprint
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
 

Semelhante a Eastwood presentation on_kellyetal2010

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Mahsa Farahanynia
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011Terry Anderson
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodNorsaremah Salleh
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Nur Hazimah Khalid
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsCSCJournals
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e referenceElaine Lasda
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingKatrien Verbert
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Allard Oelen
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Katrien Verbert
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Kristen Carter
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptxGeraldRefil3
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Elisavet Andrikopoulou
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdfstudywriters
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptxTANMAY DAS GUPTA
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioWilheminaRossi174
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...ASIS&T
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...Tao Zhang
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationSu-Tuan Lulee
 

Semelhante a Eastwood presentation on_kellyetal2010 (20)

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability Tests
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision Making
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdf
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance Education
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Eastwood presentation on_kellyetal2010

  • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
  • 2.
  • 3. Ph.D., Rutgers University (Information Science)
  • 4. MLS, Rutgers University (Information Retrieval)
  • 5. BA, University of Alabama (Psychology and English)
  • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
  • 7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
  • 8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
  • 9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
  • 10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
  • 12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
  • 13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
  • 14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
  • 15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
  • 16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
  • 17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
  • 18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
  • 19. Study 3 – Operationalized Precision at n 15
  • 20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
  • 21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
  • 23. Did users’ relevance judgments agree with baseline assessments? 19
  • 24. Did users’ relevance judgments agree with baseline assessments? 20
  • 25. Did the topic affect differences in relevance assessments? 21
  • 26. How much did relevance assessments vary between documents? 22
  • 27. Results Evaluations of System Performance 23
  • 28. Did participants modify evaluation ratings? 24
  • 29. Participant ratings compared between performance levels and studies 25
  • 30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
  • 31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
  • 32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
  • 33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
  • 34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
  • 35. Are user-experienced precision values correlated with user ratings of system performance? 31
  • 36. Are user-experienced precision values correlated with user ratings of system performance? 32
  • 37. Regression analysis: can you use experienced precision to predict user evaluation? 33
  • 38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
  • 39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
  • 40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36

Notas do Editor

  1. “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written