SlideShare uma empresa Scribd logo
1 de 18
Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Joseph Greene
Research Repository Librarian
University College Dublin
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
How accurate are IR
usage statistics?
Open Repositories 2016
Dublin, 16 June
Usage statistics are important for OA
repositories
• How is the service used overall?
• Advocacy
– Connects with authors on what is most important
to them: the use of their research
• KPI for return on investment
– Usage of a Library service
– Visibility of university’s
research
Monthly email sent to all
depositors
Infographic distributed semi-annually
by College Liaison Librarians
How accurate are they? Web robots
• Some follow rules
– Search engines, Internet Archive, link checkers,
Twitterbot, etc.
– robots.txt, naming themselves in the user agent
string
• Others do not
– Email spammers, comment spammers, dictionary
attackers, phishers, etc.
– Often mimic human users
Experimental study
• Simple random sample of 2 years of UCD
repository’s download data
– n=341, N=3.3 million; 96.20% certainty
• Manually checked to determine if robot or human
• Compared findings against our robot detection
technique
– U. Minho DSpace Stats Add-on
– Monthly outlier exclusion (manual)
Greene, J. Web robot detection in scholarly Open Access institutional
repositories. Library Hi Tech, July 2016
First finding
85% of the Research
Repository UCD’s
unfiltered downloads
come from robots
• This is confirmed in a 2013 IRUS-UK white paper
on 20 IRs; 85% was also found to be robots
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Accuracyofdownloadstats(inverseprecition)
Recall (robots)
Catching more robots improves stats
(But how much depends on the number of robots)
Getbetterstats
Catch more robots
Typical website, 15% robot traffic
OA journal, 40% robot
Internet Archive, 91% robot
OA repositories, 85% robot
How did we do at UCD?
• What proportion of robot downloads did we
catch? (Recall)
– Our method catches 94% of all robots
• How often were we correct -- how many are
actually human? (Precision)
– 98.9% of downloads that we label robots really
are robots
• How accurate are the download stats -- how
many are actually made by human beings?
(Inverse precision)
– 73% of the download statistics as reported are
human
How does that compare?
• Who knows? There are no other studies like this
on repositories!
• Applied DSpace's and EPrints' web robot
detection algorithms to our data
– Experimental
– Real data
– Same dataset used for each ‘system’
– Algorithms easy to mimic in vitro
– But SEO, crawl behaviour may be different for
different systems
Robot detection techniques used
DSpace EPrints
Minho DSpace
Statistics Add-on
Rate of requests ✓3
User agent string ✓ ✓ ✓
robots.txt access ✓
Volume of requests ✓2
✓3
List of known robot IP addresses ✓ ✓
Reverse DNS name lookup ✓1
Trap file ✓
User agents per IP address
Width of traversal in the URL space ✓3
1
Only implemented nominally or experimentally
2
Via the repeat download or ‘double-click’ filter
3
Data available as a configurable report for manual decision making
Results
0.897 0.911 0.890
0.942
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no manual
outlier checking)
Minho plus monthly
manual checking
(UCD)
Robots detected (Recall)
1.000
0.940
0.989 0.989
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no manual
outlier checking)
Minho plus monthly
manual checking
(UCD)
Accuracy of detection (Precision)
0.620
0.552 0.590
0.730
0.144
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no
manual outlier
checking)
Minho plus
monthly manual
checking (UCD)
Without
filtration
Accuracy of download stats
(Inverse precision)
I.e. 38% of DSpace’s
reported downloads are
made by robots, etc.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace EPrints Minho Minho with
monthly manual
checking (UCD)
No robot
detection
Robot detection in OA IR systems
Recall Precision Negative precision (accuracy of download stats)
Thank you!

Mais conteúdo relacionado

Destaque

Web Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoWeb Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoSara Baraccani
 
Visibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkVisibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkUCD Library
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163Mohd Yusak
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...UCD Library
 
Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Henri Kaufman
 
Access to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyAccess to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyUCD Library
 
Presentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaPresentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaMarc Garriga
 
Custom Components In Flex 4
Custom Components In Flex 4Custom Components In Flex 4
Custom Components In Flex 4Mrinal Wadhwa
 
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...UCD Library
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyesUCD Library
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)oreinaue
 
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...UCD Library
 
Presentation of #da12data initiative in the Open Data Week, Nantes
Presentation of #da12data  initiative in the Open Data Week, NantesPresentation of #da12data  initiative in the Open Data Week, Nantes
Presentation of #da12data initiative in the Open Data Week, NantesMarc Garriga
 
Introduction
IntroductionIntroduction
IntroductionDeep Deep
 
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Marc Garriga
 

Destaque (20)

Web Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoWeb Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadrato
 
Visibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkVisibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your Work
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
 
Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011
 
OpenGovernment
OpenGovernmentOpenGovernment
OpenGovernment
 
Access to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyAccess to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarney
 
mdalton_IFLA
mdalton_IFLAmdalton_IFLA
mdalton_IFLA
 
Graphis Feature
Graphis FeatureGraphis Feature
Graphis Feature
 
Confluence
ConfluenceConfluence
Confluence
 
Presentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaPresentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de Barcelona
 
Custom Components In Flex 4
Custom Components In Flex 4Custom Components In Flex 4
Custom Components In Flex 4
 
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyes
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)
 
Web 2.0 in Campaigns
Web 2.0 in CampaignsWeb 2.0 in Campaigns
Web 2.0 in Campaigns
 
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
 
Presentation of #da12data initiative in the Open Data Week, Nantes
Presentation of #da12data  initiative in the Open Data Week, NantesPresentation of #da12data  initiative in the Open Data Week, Nantes
Presentation of #da12data initiative in the Open Data Week, Nantes
 
Introduction
IntroductionIntroduction
Introduction
 
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
 

Semelhante a How Accurate are IR Usage Statistics?

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...CONUL Conference
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery ToolsNikki Kerber
 
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...UCD Library
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...LIBER Europe
 
We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) Alejandra Nann
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728Michael Levine-Clark
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardEMBL-ABR
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfAvijitChaudhuri3
 

Semelhante a How Accurate are IR Usage Statistics? (20)

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
Unit 1
Unit 1Unit 1
Unit 1
 
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery Tools
 
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
 
We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?)
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 

Mais de UCD Library

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityUCD Library
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryUCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationUCD Library
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaUCD Library
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaUCD Library
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsUCD Library
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and PreservationUCD Library
 

Mais de UCD Library (20)

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrity
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA Humanities
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and education
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishers
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in China
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in China
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital Collection
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining Collections
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locations
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and Preservation
 

Último

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Último (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

How Accurate are IR Usage Statistics?

  • 1. Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire UCD Library University College Dublin, Belfield, Dublin 4, Ireland Joseph Greene Research Repository Librarian University College Dublin joseph.greene@ucd.ie http://researchrepository.ucd.ie How accurate are IR usage statistics? Open Repositories 2016 Dublin, 16 June
  • 2. Usage statistics are important for OA repositories • How is the service used overall? • Advocacy – Connects with authors on what is most important to them: the use of their research • KPI for return on investment – Usage of a Library service – Visibility of university’s research
  • 3.
  • 4. Monthly email sent to all depositors
  • 5. Infographic distributed semi-annually by College Liaison Librarians
  • 6. How accurate are they? Web robots • Some follow rules – Search engines, Internet Archive, link checkers, Twitterbot, etc. – robots.txt, naming themselves in the user agent string • Others do not – Email spammers, comment spammers, dictionary attackers, phishers, etc. – Often mimic human users
  • 7. Experimental study • Simple random sample of 2 years of UCD repository’s download data – n=341, N=3.3 million; 96.20% certainty • Manually checked to determine if robot or human • Compared findings against our robot detection technique – U. Minho DSpace Stats Add-on – Monthly outlier exclusion (manual) Greene, J. Web robot detection in scholarly Open Access institutional repositories. Library Hi Tech, July 2016
  • 8. First finding 85% of the Research Repository UCD’s unfiltered downloads come from robots • This is confirmed in a 2013 IRUS-UK white paper on 20 IRs; 85% was also found to be robots
  • 9. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accuracyofdownloadstats(inverseprecition) Recall (robots) Catching more robots improves stats (But how much depends on the number of robots) Getbetterstats Catch more robots Typical website, 15% robot traffic OA journal, 40% robot Internet Archive, 91% robot OA repositories, 85% robot
  • 10. How did we do at UCD? • What proportion of robot downloads did we catch? (Recall) – Our method catches 94% of all robots • How often were we correct -- how many are actually human? (Precision) – 98.9% of downloads that we label robots really are robots • How accurate are the download stats -- how many are actually made by human beings? (Inverse precision) – 73% of the download statistics as reported are human
  • 11. How does that compare? • Who knows? There are no other studies like this on repositories! • Applied DSpace's and EPrints' web robot detection algorithms to our data – Experimental – Real data – Same dataset used for each ‘system’ – Algorithms easy to mimic in vitro – But SEO, crawl behaviour may be different for different systems
  • 12. Robot detection techniques used DSpace EPrints Minho DSpace Statistics Add-on Rate of requests ✓3 User agent string ✓ ✓ ✓ robots.txt access ✓ Volume of requests ✓2 ✓3 List of known robot IP addresses ✓ ✓ Reverse DNS name lookup ✓1 Trap file ✓ User agents per IP address Width of traversal in the URL space ✓3 1 Only implemented nominally or experimentally 2 Via the repeat download or ‘double-click’ filter 3 Data available as a configurable report for manual decision making
  • 14. 0.897 0.911 0.890 0.942 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Robots detected (Recall)
  • 15. 1.000 0.940 0.989 0.989 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Accuracy of detection (Precision)
  • 16. 0.620 0.552 0.590 0.730 0.144 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Without filtration Accuracy of download stats (Inverse precision) I.e. 38% of DSpace’s reported downloads are made by robots, etc.
  • 17. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace EPrints Minho Minho with monthly manual checking (UCD) No robot detection Robot detection in OA IR systems Recall Precision Negative precision (accuracy of download stats)

Notas do Editor

  1. Download and other usage statistics in an item view
  2. In addition, data is provided to Schools for quality reviews and accreditation
  3. Have been aware of web robots since 2009. Using U Minho plus visually checking for outliers once/month Hit 1mil dls in 2015, decided we must know more about it (how to properly identify, how accurate our statistics are); want to have confidence in the information that we produce
  4. Experiment: simple random sample of 2 years of download data (n=341, N=3.3 million for 96.20% certainty), manually checked to determine if robot or human. DSpace 1.8.2 with U. Minho DSpace Statistics Add-on v. 4. Apache Tomcat behind Apache HTTP server; logs in Apache Combined Log Format. Minho registers every download in the PostgreSQL database. Results to be published in July 2016 issue of Library Hi Tech (Greene 2016)
  5. See: INFORMATION POWER LTD. 2013. IRUS download data: identifying unusual usage [Online]. Available: http://www.irus.mimas.ac.uk/news/IRUS_download_data_Final_report.pdf [Accessed 2015-12-11]. Confirms 85% figure DORAN, D. & GOKHALE, S. S. 2011. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22, 183-210. Hypothesizes why so high in OA (p.191)
  6. Typical website (15% robot traffic) (precision = 0.8727, mean of four studies; robots:total sessions = 0.1516, mean of four studies) OA journal (40% robot) HUNTINGTON, P., NICHOLAS, D. & JAMALI, H. R. 2008. Web robot detection in the scholarly information environment. Journal of Information Science, 34, 726-741. OA repositories (85% robot) Greene 2016 and Information Power 2013 (see above) Internet Archive (91% robot) ALNOAMANY, Y., WEIGLE, M. C. & NELSON, M. L. 2013. Access patterns for robots and humans in web archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 339-348. Reverse is also true: fail to catch robots (e.g. deterioration over time as robots improve their capabilities), accuracy of stats diminishes Formula: Greene 2016 𝐏𝐢𝐧𝐯 = 𝐓𝐑(𝐑−𝐏𝐑−𝟏)+𝟐𝐓𝐏𝐑−𝐏(𝐓+𝐑−𝟏) 𝐑(𝐓𝐑−𝐏−𝐓)+𝐏 R = recall (robot detection) P = precision (robot detection) Pinv = inverse precision (human stats) T = ratio of robots to total
  7. Greene 2016