SlideShare uma empresa Scribd logo
1 de 44
Reference Rot
Los Alamos National Laboratory
Research Library Prototyping Team
Presented by Shawn M. Jones
Citations are the building blocks
of scholarly communications
Citations
Provide
Support and Evidence
+
Experiment and Results
Argument
DOIs Identify Scholarly
Publications
• Almost all scholarly publications (papers, articles, etc.) have an
associated Digital Object Identifier (DOI) maintained by CrossRef
• DOIs are persistent
• If a publisher changes ownership or sells part of its catalog, the DOI remains
with the publication so that scholars can continue to find the paper into the
future
"ISO 26324:2012(en), Information and documentation — Digital object identifier system". ISO.
URIs Identify Web Resources
The World Wide Web consists of resources, such as pages or
applications.
Each web resource is identified by a Uniform Resource Identifier (URI).
Examples of web resources:
• Web pages
• Google Search
• Software Web Sites
Each resource may have one or more representations that vary by
dimensions such as language or document format.
Uniform Resource Locators (URLs) are a subset of URIs that require a
web location (a server with an application or directory structure).
Architecture of the World Wide Web, Volume One (15 December 2004) edited by Ian Jacobs, Norman Walsh. https://www.w3.org/TR/webarch/
Scholars use URIs in References to
Web Resources
• The web resources behind URIs have no guarantee of
persistence, they can disappear because:
• Their website is gone due to lack of funding
• An organization changes its website and doesn’t provide redirects to
old resource
• And more…
Why use URIs?
• Existing publications are not the only
supporting evidence in scholarly work
• URIs are invaluable to researchers, it
allows them to cite:
• Software Projects
• Datasets
• Affiliation Web Sites
• Funding
• Scholar Web Sites
• Blog Posts
• Technical Reports
• Evidence such as news stories or Tweets
• And more…
Consider The Publication of the Paper
and the Reader In the Future Following
One of Its References
The paper is published at some point, and its citations using URIs were good at that time.
Will they be good for a reader in the future?
Reference Rot Problem #1:
Link Rot
The reader follows a reference and it is gone
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers
from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
This web-at-large resource is linked from the scholarly article Generalizing the OpenURL
Framework beyond References to Scholarly Works but it is now gone!
Reference Rot Problem #2:
Content Drift
The reader follows a reference and it is not the same
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References
Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
This web-at-large resource is linked from the scholarly article Searching for Quantum Gravity with
High Energy Atmospheric Neutrinos and AMANDA-II but it has changed since publication.
A Potential Solution:
Web Archives!
Web Archives make snapshots of web resources
so users can go back and look at a web page as it
was in the past.
There are many web archives, such as:
• Internet Archive
• Perma.cc
• Archive.is
• Icelandic Web Archive
• UK Web Archive
• Library of Congress
These snapshots are called mementos.
Questions
Addressed By
Our Research
1. Is the use of URI references on the
rise?
2. To what extent does link rot exist in
scholarly URI references?
3. To what extent does content drift
exist in scholarly URI references?
4. What can we do about reference
rot? Can Web Archives help?
5. When are people using URIs when
they should be using DOIs?
6. What can we do to ensure people
use DOIs when they exist?
Dataset
• 1.8 million articles from arXiv, Elsevier, and PubMed Central from 1997 to 2012
• For content drift comparison, Mementos are taken from 18 web archives
• The data was processed by the University of Edinburgh and Los Alamos National
Laboratory
• From these articles we extracted 1.06 million URI references
Is the use of URI
references on the rise?
The Number of URI References
Goes Up Each Publication Year
Articles and URI references per
publication year - arXiv corpus.
Articles and URI references per
publication year - Elsevier corpus.
Articles and URI references per
publication year - PMC corpus.
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers
from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
To what extent does link rot
exist in scholarly URI
references?
Link Rot for References Gets Worse
As We Look At Older Publications
arXiv corpus Elsevier corpus PMC corpus
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers
from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
If a URI Reference no longer respond, then we have link rot.
Fewer Publications Are Immune
to Reference Rot
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers
from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
Immune publications have no URI references
Healthy publications have no link rot and have
mementos within 14 days of publication for all of their
references
Infected publications have link rot or have no
mementos for all of their references
As noted before, more and more publications use URI
references
To what extent does
content drift exist in
scholarly URI references?
Because of Web Archives, We Can
Study Content Drift
This Page Changed Much over 3 Months
This Page Hasn’t Changed in 19 Years
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
The Frequency Of Memento Creation
Is Not the Same for All Resources
Archived Regularly
Archived Occasionally
Archived Once
Archived Never
Step 1: Find a memento of a
reference from the publication date
of the paper
If a memento before the publication date and after the publication date match according to 4 similarity
measures, we consider the two to be the same and either is representative of the reference as it existed at the
time of publication.
Representative mementos get compared with the current live version of the same reference in step 2.
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
Many References Do Not Have
Representative Mementos
arXiv Corpus Elsevier Corpus PMC Corpus
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
Step 2: Compare the memento of the
reference with the web resource
from now
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
Using the same 4 similarity measures, we compare the content of the current resource with the content of the
representative memento.
Content Drift Is Worse For Older
Publications
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
arXiv corpus PMC corpusElsevier corpus
What can we do about
reference rot? Can Web
Archives help?
What can we do about reference
rot? Can Web Archives help?
1. Scholars can pro-actively create
mementos in web archives for URI
references
• The Internet Archive’s “Save Page Now”
• Perma.cc, Archive.is, and Web Cite exist
for this purpose
• Mink, Webrecorder.io
2. Other scholars/editors can reference
these snapshots in scholarly literature
• Robust Links
• Memento
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
Are people using URIs
when they should be
using DOIs?
Are people using URIs in references
when they should be using DOIs?
Van de Sompel H, Klein M, and Jones SM. 2016. Persistent URIs Must Be Used To Be Persistent. In Proceedings of WWW
2016, pp. 119-120. DOI: 10.1145/2872518.2889352
arXiv corpus PMC corpus
We hypothesize that this is caused by citation software using the URI instead
of the DOI because it does not know the DOI.
Problem: Machines Just See Links,
Where Is The DOI?
Links
Links
Link
Link
Links
Links
Links
Links
Links
Links
URI
Humans Can Get Meaning from
Links on a Web Page
Authors
DOI
Bibliographic
Metadata
PDF Document
Problem: Machines Cannot Find
the DOI
Browsers and citation software can easily
access the URI; it indicates how to retrieve the
resource.
The DOI is buried in the text of the landing
page.
Citation software must be programmed with
many publishers’ templates in order to find
the DOI across all resources. Publishers also
change their templates, causing software to
break.
Some publishers do not use the DOI in their
EndNote/BibTeX citations.
What can we do to
ensure people use DOIs
when they exist?
HTTP Already Has A Solution, We
Just Need to Use It
• HTTP is the protocol of the web
• Before HTTP sends content, it sends headers
• Inside these headers, publishers can use the Link header to reference other
content
• Because the metadata is stored in the transfer protocol:
• This solution requires no change to the content, meaning it works with any document
format.
• This solution can be applied to existing content with no change to the content.
HTTP/1.1 200 OK
Date: Mon, 17 Jul 2017 17:53:54 GMT
Server: Apache/2.2.3 (Red Hat)
Connection: close
Link: <http://doi.org/10.101010/99999999>; rel=“identifier”
Content-Type: text/html; charset=UTF-8
Van de Sompel H and Nelson ML. (2015) Reminiscing About 15 Years of Interoperability Efforts. D-Lib 21: 11/12. DOI: 10.1045/november2015-
vandesompel
Using the HTTP Link Header, the
machine can find the DOI
Using the HTTP link header, publishers can provide metadata
linking to the DOI from their resources.
This way, a browser or citation manager can find the DOI if
they are currently on the landing page or the PDF page.
This effort is named
“Signposting the Scholarly Web”.
Signposting is not just for DOIs
• Why not link from
the document’s
landing page to the
author’s ORCID?
Signposting is not just for DOIs
• Why not link from
the document to the
metadata?
Signposting is not just for DOIs
• Why not link from
the landing page to
supplemental items
or other publication
formats?
Find out more at signposting.org
Recap
Scholarly URI References In Jeopardy
• URIs identify web resources and
are not persistent
• Link rot and content drift are
problems for URI references and
get worse for the older the
publication is
• Scholars sometimes use URIs
instead of DOIs when creating
references, even if DOIs exist
New Hope for Scholarly References
• Web Archives play a role in
preserving references
• We can use a variety of tools
to create mementos of
references at the time of
publication
• We can access them with
Memento and Robust Links
• We can use signposting to
help reference managers and
other tools find DOIs and
other information
Thanks for listening
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five
Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI
References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
Van de Sompel H, Klein M, and Jones SM. 2016. Persistent URIs Must Be Used To Be Persistent. In Proceedings of WWW
2016, pp. 119-120. DOI: 10.1145/2872518.2889352
Van de Sompel H and Nelson ML. (2015) Reminiscing About 15 Years of Interoperability Efforts. D-Lib 21: 11/12. DOI:
10.1045/november2015-vandesompel
http://robustlinks.mementoweb.org
http://signposting.org http://timetravel.mementoweb.org
Backup Slides
Demonstrations
• Memento - http://timetravel.mementoweb.org
• Robust Links -
http://www.dlib.org/dlib/november15/vandesomp
el/11vandesompel.html

Mais conteúdo relacionado

Mais procurados

Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebGillian Byrne
 
Library resources for EL/EN6770
Library resources for EL/EN6770Library resources for EL/EN6770
Library resources for EL/EN6770NUS Libraries
 
Why SoMe - Research context of social media use
Why SoMe   - Research context of social media useWhy SoMe   - Research context of social media use
Why SoMe - Research context of social media useRob Knight
 
Library Tutorial for South Asian Studies
Library Tutorial for South Asian StudiesLibrary Tutorial for South Asian Studies
Library Tutorial for South Asian StudiesNUS Libraries
 
From peer review to page views: Social networking for academics
From peer review to page views: Social networking for academicsFrom peer review to page views: Social networking for academics
From peer review to page views: Social networking for academicsLydia Thorne
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Todd Rutherford
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly ResourcesRobert Sanderson
 

Mais procurados (8)

Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
 
Library resources for EL/EN6770
Library resources for EL/EN6770Library resources for EL/EN6770
Library resources for EL/EN6770
 
Why SoMe - Research context of social media use
Why SoMe   - Research context of social media useWhy SoMe   - Research context of social media use
Why SoMe - Research context of social media use
 
Library Tutorial for South Asian Studies
Library Tutorial for South Asian StudiesLibrary Tutorial for South Asian Studies
Library Tutorial for South Asian Studies
 
From peer review to page views: Social networking for academics
From peer review to page views: Social networking for academicsFrom peer review to page views: Social networking for academics
From peer review to page views: Social networking for academics
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
Focus On Twitter
Focus On TwitterFocus On Twitter
Focus On Twitter
 

Semelhante a Reference Rot

A modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfA modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfJessica Navarro
 
Ensuring the Integrity (& Continuity) of Our Record of Scholarship
Ensuring the Integrity (& Continuity) of Our Record of ScholarshipEnsuring the Integrity (& Continuity) of Our Record of Scholarship
Ensuring the Integrity (& Continuity) of Our Record of ScholarshipEDINA, University of Edinburgh
 
Semantic citation
Semantic citationSemantic citation
Semantic citationDeepak K
 
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]Peter Burnhill
 
Citing sources and referencing 2012 v4
Citing sources and referencing 2012 v4Citing sources and referencing 2012 v4
Citing sources and referencing 2012 v4PEASS_2014
 
UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG: connecting the knowledge community
 
J loke.referencing and plagiarism
J loke.referencing and plagiarismJ loke.referencing and plagiarism
J loke.referencing and plagiarismDr. Jennifer Loke
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...EDINA, University of Edinburgh
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffHeather Seneff
 
LIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersLIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersPrattSILS
 
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wileyPSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wileyMatthew Ragucci
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introductionElizabeth Yates
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.ismaturban
 
Citation Searching Presentation
Citation Searching PresentationCitation Searching Presentation
Citation Searching PresentationValerie Forrestal
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionTimothy Cole
 

Semelhante a Reference Rot (20)

A modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfA modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdf
 
Ensuring the Integrity (& Continuity) of Our Record of Scholarship
Ensuring the Integrity (& Continuity) of Our Record of ScholarshipEnsuring the Integrity (& Continuity) of Our Record of Scholarship
Ensuring the Integrity (& Continuity) of Our Record of Scholarship
 
Digital Research
Digital ResearchDigital Research
Digital Research
 
Semantic citation
Semantic citationSemantic citation
Semantic citation
 
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
Web Today, Good Tomorrow? Transactional archiving of web content [Long Version]
 
Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...
 
Citing sources and referencing 2012 v4
Citing sources and referencing 2012 v4Citing sources and referencing 2012 v4
Citing sources and referencing 2012 v4
 
UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...UKSG webinar: Making scholarly communication great again. Do institutional re...
UKSG webinar: Making scholarly communication great again. Do institutional re...
 
J loke.referencing and plagiarism
J loke.referencing and plagiarismJ loke.referencing and plagiarism
J loke.referencing and plagiarism
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
 
Reference Rot: Threat and Remedy
Reference Rot: Threat and RemedyReference Rot: Threat and Remedy
Reference Rot: Threat and Remedy
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
 
LIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation PostersLIS 653-02 Spring 2014 Final Presentation Posters
LIS 653-02 Spring 2014 Final Presentation Posters
 
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wileyPSP 2018 - The Changing discovery landscape: Tools and services from wiley
PSP 2018 - The Changing discovery landscape: Tools and services from wiley
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introduction
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Reference Rot and E-Theses: Threat and Remedy
Reference Rot and E-Theses: Threat and RemedyReference Rot and E-Theses: Threat and Remedy
Reference Rot and E-Theses: Threat and Remedy
 
Citation Searching Presentation
Citation Searching PresentationCitation Searching Presentation
Citation Searching Presentation
 
LIS research
LIS researchLIS research
LIS research
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration Introduction
 

Mais de Shawn Jones

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...Shawn Jones
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Shawn Jones
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsShawn Jones
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitShawn Jones
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-ItShawn Jones
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesShawn Jones
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestShawn Jones
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentShawn Jones
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wikiShawn Jones
 

Mais de Shawn Jones (19)

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
 

Último

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 

Último (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 

Reference Rot

  • 1. Reference Rot Los Alamos National Laboratory Research Library Prototyping Team Presented by Shawn M. Jones
  • 2. Citations are the building blocks of scholarly communications Citations Provide Support and Evidence + Experiment and Results Argument
  • 3. DOIs Identify Scholarly Publications • Almost all scholarly publications (papers, articles, etc.) have an associated Digital Object Identifier (DOI) maintained by CrossRef • DOIs are persistent • If a publisher changes ownership or sells part of its catalog, the DOI remains with the publication so that scholars can continue to find the paper into the future "ISO 26324:2012(en), Information and documentation — Digital object identifier system". ISO.
  • 4. URIs Identify Web Resources The World Wide Web consists of resources, such as pages or applications. Each web resource is identified by a Uniform Resource Identifier (URI). Examples of web resources: • Web pages • Google Search • Software Web Sites Each resource may have one or more representations that vary by dimensions such as language or document format. Uniform Resource Locators (URLs) are a subset of URIs that require a web location (a server with an application or directory structure). Architecture of the World Wide Web, Volume One (15 December 2004) edited by Ian Jacobs, Norman Walsh. https://www.w3.org/TR/webarch/
  • 5. Scholars use URIs in References to Web Resources • The web resources behind URIs have no guarantee of persistence, they can disappear because: • Their website is gone due to lack of funding • An organization changes its website and doesn’t provide redirects to old resource • And more…
  • 6. Why use URIs? • Existing publications are not the only supporting evidence in scholarly work • URIs are invaluable to researchers, it allows them to cite: • Software Projects • Datasets • Affiliation Web Sites • Funding • Scholar Web Sites • Blog Posts • Technical Reports • Evidence such as news stories or Tweets • And more…
  • 7. Consider The Publication of the Paper and the Reader In the Future Following One of Its References The paper is published at some point, and its citations using URIs were good at that time. Will they be good for a reader in the future?
  • 8. Reference Rot Problem #1: Link Rot The reader follows a reference and it is gone Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253 This web-at-large resource is linked from the scholarly article Generalizing the OpenURL Framework beyond References to Scholarly Works but it is now gone!
  • 9. Reference Rot Problem #2: Content Drift The reader follows a reference and it is not the same Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475 This web-at-large resource is linked from the scholarly article Searching for Quantum Gravity with High Energy Atmospheric Neutrinos and AMANDA-II but it has changed since publication.
  • 10. A Potential Solution: Web Archives! Web Archives make snapshots of web resources so users can go back and look at a web page as it was in the past. There are many web archives, such as: • Internet Archive • Perma.cc • Archive.is • Icelandic Web Archive • UK Web Archive • Library of Congress These snapshots are called mementos.
  • 11. Questions Addressed By Our Research 1. Is the use of URI references on the rise? 2. To what extent does link rot exist in scholarly URI references? 3. To what extent does content drift exist in scholarly URI references? 4. What can we do about reference rot? Can Web Archives help? 5. When are people using URIs when they should be using DOIs? 6. What can we do to ensure people use DOIs when they exist?
  • 12. Dataset • 1.8 million articles from arXiv, Elsevier, and PubMed Central from 1997 to 2012 • For content drift comparison, Mementos are taken from 18 web archives • The data was processed by the University of Edinburgh and Los Alamos National Laboratory • From these articles we extracted 1.06 million URI references
  • 13. Is the use of URI references on the rise?
  • 14. The Number of URI References Goes Up Each Publication Year Articles and URI references per publication year - arXiv corpus. Articles and URI references per publication year - Elsevier corpus. Articles and URI references per publication year - PMC corpus. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253
  • 15. To what extent does link rot exist in scholarly URI references?
  • 16. Link Rot for References Gets Worse As We Look At Older Publications arXiv corpus Elsevier corpus PMC corpus Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253 If a URI Reference no longer respond, then we have link rot.
  • 17. Fewer Publications Are Immune to Reference Rot Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253 Immune publications have no URI references Healthy publications have no link rot and have mementos within 14 days of publication for all of their references Infected publications have link rot or have no mementos for all of their references As noted before, more and more publications use URI references
  • 18. To what extent does content drift exist in scholarly URI references?
  • 19. Because of Web Archives, We Can Study Content Drift This Page Changed Much over 3 Months This Page Hasn’t Changed in 19 Years Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
  • 20. The Frequency Of Memento Creation Is Not the Same for All Resources Archived Regularly Archived Occasionally Archived Once Archived Never
  • 21. Step 1: Find a memento of a reference from the publication date of the paper If a memento before the publication date and after the publication date match according to 4 similarity measures, we consider the two to be the same and either is representative of the reference as it existed at the time of publication. Representative mementos get compared with the current live version of the same reference in step 2. Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
  • 22. Many References Do Not Have Representative Mementos arXiv Corpus Elsevier Corpus PMC Corpus Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
  • 23. Step 2: Compare the memento of the reference with the web resource from now Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475 Using the same 4 similarity measures, we compare the content of the current resource with the content of the representative memento.
  • 24. Content Drift Is Worse For Older Publications Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475 arXiv corpus PMC corpusElsevier corpus
  • 25. What can we do about reference rot? Can Web Archives help?
  • 26. What can we do about reference rot? Can Web Archives help? 1. Scholars can pro-actively create mementos in web archives for URI references • The Internet Archive’s “Save Page Now” • Perma.cc, Archive.is, and Web Cite exist for this purpose • Mink, Webrecorder.io 2. Other scholars/editors can reference these snapshots in scholarly literature • Robust Links • Memento Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475
  • 27. Are people using URIs when they should be using DOIs?
  • 28. Are people using URIs in references when they should be using DOIs? Van de Sompel H, Klein M, and Jones SM. 2016. Persistent URIs Must Be Used To Be Persistent. In Proceedings of WWW 2016, pp. 119-120. DOI: 10.1145/2872518.2889352 arXiv corpus PMC corpus We hypothesize that this is caused by citation software using the URI instead of the DOI because it does not know the DOI.
  • 29. Problem: Machines Just See Links, Where Is The DOI? Links Links Link Link Links Links Links Links Links Links URI
  • 30. Humans Can Get Meaning from Links on a Web Page Authors DOI Bibliographic Metadata PDF Document
  • 31. Problem: Machines Cannot Find the DOI Browsers and citation software can easily access the URI; it indicates how to retrieve the resource. The DOI is buried in the text of the landing page. Citation software must be programmed with many publishers’ templates in order to find the DOI across all resources. Publishers also change their templates, causing software to break. Some publishers do not use the DOI in their EndNote/BibTeX citations.
  • 32. What can we do to ensure people use DOIs when they exist?
  • 33. HTTP Already Has A Solution, We Just Need to Use It • HTTP is the protocol of the web • Before HTTP sends content, it sends headers • Inside these headers, publishers can use the Link header to reference other content • Because the metadata is stored in the transfer protocol: • This solution requires no change to the content, meaning it works with any document format. • This solution can be applied to existing content with no change to the content. HTTP/1.1 200 OK Date: Mon, 17 Jul 2017 17:53:54 GMT Server: Apache/2.2.3 (Red Hat) Connection: close Link: <http://doi.org/10.101010/99999999>; rel=“identifier” Content-Type: text/html; charset=UTF-8 Van de Sompel H and Nelson ML. (2015) Reminiscing About 15 Years of Interoperability Efforts. D-Lib 21: 11/12. DOI: 10.1045/november2015- vandesompel
  • 34. Using the HTTP Link Header, the machine can find the DOI Using the HTTP link header, publishers can provide metadata linking to the DOI from their resources. This way, a browser or citation manager can find the DOI if they are currently on the landing page or the PDF page. This effort is named “Signposting the Scholarly Web”.
  • 35. Signposting is not just for DOIs • Why not link from the document’s landing page to the author’s ORCID?
  • 36. Signposting is not just for DOIs • Why not link from the document to the metadata?
  • 37. Signposting is not just for DOIs • Why not link from the landing page to supplemental items or other publication formats?
  • 38. Find out more at signposting.org
  • 39. Recap
  • 40. Scholarly URI References In Jeopardy • URIs identify web resources and are not persistent • Link rot and content drift are problems for URI references and get worse for the older the publication is • Scholars sometimes use URIs instead of DOIs when creating references, even if DOIs exist
  • 41. New Hope for Scholarly References • Web Archives play a role in preserving references • We can use a variety of tools to create mementos of references at the time of publication • We can access them with Memento and Robust Links • We can use signposting to help reference managers and other tools find DOIs and other information
  • 42. Thanks for listening Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253 Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, et al. (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLOS ONE 11(12): e0167475. DOI: 10.1371/journal.pone.0167475 Van de Sompel H, Klein M, and Jones SM. 2016. Persistent URIs Must Be Used To Be Persistent. In Proceedings of WWW 2016, pp. 119-120. DOI: 10.1145/2872518.2889352 Van de Sompel H and Nelson ML. (2015) Reminiscing About 15 Years of Interoperability Efforts. D-Lib 21: 11/12. DOI: 10.1045/november2015-vandesompel http://robustlinks.mementoweb.org http://signposting.org http://timetravel.mementoweb.org
  • 44. Demonstrations • Memento - http://timetravel.mementoweb.org • Robust Links - http://www.dlib.org/dlib/november15/vandesomp el/11vandesompel.html