SlideShare uma empresa Scribd logo
1 de 36
Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood  on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
Diane Kelly Associate Professor, School of Library and Information Science, UNC Chapel Hill ,[object Object]
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
Primary Aim of Research “to investigate the  relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
Methods 7
Studies 1 and 2 :  effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
Study 3 – Operationalized Precision at n 15
Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
Results Relevance Assessments 18
Did users’ relevance judgments agree with baseline assessments? 19
Did users’ relevance judgments agree with baseline assessments? 20
Did the topic affect differences in relevance assessments? 21
How much did relevance assessments vary between documents? 22
Results Evaluations of  System Performance 23
Did participants modify evaluation ratings? 24
Participant ratings compared between performance levels and studies 25
Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
What are the differences between study 1 and study 2? Intended difference:  Completion time? 28
What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
Are user-experienced precision values correlated with user ratings of system performance? 31
Are user-experienced precision values correlated with user ratings of system performance? 32

Mais conteúdo relacionado

Mais procurados

C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
Robyn
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
u070536
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
Marco Aurelio Gerosa
 

Mais procurados (19)

2. Research Process
2. Research Process2. Research Process
2. Research Process
 
Experimental research
Experimental researchExperimental research
Experimental research
 
9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review9-Meta Analysis/ Systematic Review
9-Meta Analysis/ Systematic Review
 
C:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigmC:\fakepath\applied and participatory paradigm
C:\fakepath\applied and participatory paradigm
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Measuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health InterventionsMeasuring Engagement in Technology-Based Health Interventions
Measuring Engagement in Technology-Based Health Interventions
 
Trln
TrlnTrln
Trln
 
Comparative and non-comparative study
Comparative and non-comparative studyComparative and non-comparative study
Comparative and non-comparative study
 
Assignment 2 ppt
Assignment 2 pptAssignment 2 ppt
Assignment 2 ppt
 
Comparative and non comparative studies
Comparative and non comparative studiesComparative and non comparative studies
Comparative and non comparative studies
 
meta analysis
meta analysis meta analysis
meta analysis
 
Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2Basics of Systematic Review and Meta-analysis: Part 2
Basics of Systematic Review and Meta-analysis: Part 2
 
Threats to Internal Validity
Threats to Internal ValidityThreats to Internal Validity
Threats to Internal Validity
 
Systematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary SlidesSystematic Review & Meta-Analysis Course - Summary Slides
Systematic Review & Meta-Analysis Course - Summary Slides
 
Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...Awareness Support in Global Software Development: A Systematic Review Based o...
Awareness Support in Global Software Development: A Systematic Review Based o...
 
Tufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd AfcTufts Fwpe Data Analysis For Aota Pd Afc
Tufts Fwpe Data Analysis For Aota Pd Afc
 
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
Persuasive Communication: A Comparison of Major Attitude- Behaviour Theories ...
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2Systematic review and meta analysis applications in medication safety 2
Systematic review and meta analysis applications in medication safety 2
 

Destaque

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
megmeg42
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
Jyotpreet Kaur
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
kristamahl
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
kvilberg
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
kvilberg
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
Shubhranshu Upadhyay
 

Destaque (11)

Eastwood users lost
Eastwood users lostEastwood users lost
Eastwood users lost
 
Assignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute ManagementAssignment 3 - Certification in Dispute Management
Assignment 3 - Certification in Dispute Management
 
Intro to memtech java
Intro to memtech javaIntro to memtech java
Intro to memtech java
 
Alexis Is...
Alexis Is...Alexis Is...
Alexis Is...
 
Euroopa keeltepäev näidis
Euroopa keeltepäev näidisEuroopa keeltepäev näidis
Euroopa keeltepäev näidis
 
Communal helpers
Communal helpersCommunal helpers
Communal helpers
 
D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'D3 nu business plan 'helping hands'
D3 nu business plan 'helping hands'
 
การวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามาการวิจัยการอ่านแบบพาโนรามา
การวิจัยการอ่านแบบพาโนรามา
 
ITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional FeesITPI, Conditions of Engagement and Scale of Professional Fees
ITPI, Conditions of Engagement and Scale of Professional Fees
 
Dip fingerprint
Dip fingerprintDip fingerprint
Dip fingerprint
 
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-BankA Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
A Novel Approach to Fingerprint Identification Using Gabor Filter-Bank
 

Semelhante a Eastwood presentation on_kellyetal2010

Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
WilheminaRossi174
 

Semelhante a Eastwood presentation on_kellyetal2010 (20)

Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
 
Design based for lisbon 2011
Design based for lisbon 2011Design based for lisbon 2011
Design based for lisbon 2011
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...Classification of Researcher's Collaboration Patterns Towards Research Perfor...
Classification of Researcher's Collaboration Patterns Towards Research Perfor...
 
The Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability TestsThe Influence of Participant Personality in Usability Tests
The Influence of Participant Personality in Usability Tests
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Scalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision MakingScalable Exploration of Relevance Prospects to Support Decision Making
Scalable Exploration of Relevance Prospects to Support Decision Making
 
Resource comparison SciKnow 2019
Resource comparison SciKnow 2019Resource comparison SciKnow 2019
Resource comparison SciKnow 2019
 
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
Agents vs Users: Visual Recommendation of Research Talks with Multiple Dimens...
 
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
Analysis Of Qualitative Methods Used In Computer And Educational Technologies...
 
1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx1_Q2-PRACTICAL-RESEARCH.pptx
1_Q2-PRACTICAL-RESEARCH.pptx
 
Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...Whether simulation models that fall under the information systems category ad...
Whether simulation models that fall under the information systems category ad...
 
impact of COViD 19.pdf
impact of COViD 19.pdfimpact of COViD 19.pdf
impact of COViD 19.pdf
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
Colleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactioColleague #1 - Renee Morris Plum investigated the interactio
Colleague #1 - Renee Morris Plum investigated the interactio
 
RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...RDAP14 Poster: Evaluation of research data services: What things should we ev...
RDAP14 Poster: Evaluation of research data services: What things should we ev...
 
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...Assessing Perceived Usability of the Data Curation Profiles Toolkit  Using th...
Assessing Perceived Usability of the Data Curation Profiles Toolkit Using th...
 
Meta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance EducationMeta-Analysis of Interaction in Distance Education
Meta-Analysis of Interaction in Distance Education
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Eastwood presentation on_kellyetal2010

  • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance A presentation by Meg Eastwood on the 2010 paper by D. Kelly, X. Fu, and C. Shah INF 384H September 26th, 2011 1
  • 2.
  • 3. Ph.D., Rutgers University (Information Science)
  • 4. MLS, Rutgers University (Information Retrieval)
  • 5. BA, University of Alabama (Psychology and English)
  • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science2
  • 7. Primary Aim of Research “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2) 3
  • 8. Secondary Aim of Research “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2) 4
  • 9. Previous Experimental Protocols Traditional lab-based Naturalistic TREC Interactive Track Study entire search episodes Thomas and Hawking (2006) Trade control for “ecological validity” 5 Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
  • 10. Literature Review Main criticisms of previous studies: Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments Users not provided with explicit instructions Users may have been fatigued Low sample sizes 6
  • 12. Studies 1 and 2 : effect of position of relevant documents on user’s evaluation of system performance Study 3: effect of number of relevant documents 8
  • 13. 9 Participants were asked to help researchers evaluate four search engines For each search engine, read topic and posed one query
  • 14. 10 After issuing query, all participants were re-directed to the same results page with 10 standardized results
  • 15. 11 Participants asked to evaluate full text of each search result in the order presented and judge the relevance
  • 16. 12 After evaluating all the documents on the results page, participants were asked to evaluate the search engine
  • 17. Study 1 Operationalized average precision at n Subjects required to evaluate all 10 documents 13
  • 18. Study 2 Also operationalized average precision at n Subjects instructed to find five relevant documents 14
  • 19. Study 3 – Operationalized Precision at n 15
  • 20. Topics and Documents 16 Selected topics associated with newspaper articles about current events Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
  • 21. Study Participants 17 “Convenient sample” (pg 9:27) of undergraduates from UNC 27 participants for each study (1 -3) Demographic information collected: Sex Age Major Search experience Search frequency
  • 23. Did users’ relevance judgments agree with baseline assessments? 19
  • 24. Did users’ relevance judgments agree with baseline assessments? 20
  • 25. Did the topic affect differences in relevance assessments? 21
  • 26. How much did relevance assessments vary between documents? 22
  • 27. Results Evaluations of System Performance 23
  • 28. Did participants modify evaluation ratings? 24
  • 29. Participant ratings compared between performance levels and studies 25
  • 30. Participant ratings compared between performance levels and studies 26 Study 1 showed no significant differences in ratings according to performance level
  • 31. Participant ratings compared between performance levels and studies 27 Studies 2 and 3 did show significant differences in ratings according to performance level
  • 32. What are the differences between study 1 and study 2? Intended difference: Completion time? 28
  • 33. What are the differences between study 1 and study 2? Unintended differences: Instructions for study 2 provided clearer performance objective Subjects felt more successful in study 2? 29
  • 34. User Experienced Precision 30 “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
  • 35. Are user-experienced precision values correlated with user ratings of system performance? 31
  • 36. Are user-experienced precision values correlated with user ratings of system performance? 32
  • 37. Regression analysis: can you use experienced precision to predict user evaluation? 33
  • 38. Authors’ Discussion and Conclusions “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26) Thoughtful analysis of experimental caveats and generalizability of results Convenient sample of students Only one genre of documents represented Are these results specific to informational/exploratory tasks? 34
  • 39. Suggested Class Discussion Topics Areas where the experiment may have been too tightly controlled/artificial: Controlling order in which users could rate documents? Areas where the experiment may not have been as controlled as the authors intended: Allowing subjects to formulate own queries Study 2 allowed participants to feel “successful”? Ten-point evaluation scale versus five-point evaluation scale? 35
  • 40. References Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597 36

Notas do Editor

  1. “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written