SlideShare uma empresa Scribd logo
1 de 19
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Martin Klein & Lyudmila Balakireva
Los Alamos National Laboratory
{mklein, ludab}@lanl.gov
On the Persistence of Persistent
Identifiers of the Scholarly Web
HEAD GET GET+ Chrome
https://arxiv.org/abs/2004.03011
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
How does this work via HTTP?
https://doi.org/10.1007/978-3-540-87599-4_38
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Arrived at landing page
https://doi.org/10.1007/978-3-540-87599-4_38
https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP redirects
https://doi.org/10.1007/978-3-540-87599-4_38
 (HTTP 302 redirect)
http://link.springer.com/10.1007/978-3-540-87599-4_38
 (HTTP 301 redirect)
https://link.springer.com/10.1007/978-3-540-87599-4_38
 (HTTP 302 redirect)
https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
 (HTTP 200)
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Questions…
• How persistent is this DOI resolution?
• Given different clients and network environments:
• Can we consistently arrive at the same location at the end
of the redirect chain?
• Is the path there (redirect chain) the same?
• Are there differences between Open Access and non-OA?
• Subscription vs non-Subscription level content?
• Do scholarly content providers differ from the popular web?
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Idea…
• Comparative study investigating scholarly publishers’ responses
• To common HTTP requests
• Against DOIs
• Using different web clients and request methods, resembling
• Machines ”browsing”, crawling
• Humans browsing
• From network environments with different subscriptions/licenses
• Amazon Web Service EC2 instance
• LANL internal
• Compare against web servers providing popular web content
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP clients, request methods, dataset, networks
• HTTP HEAD
• cURL
• HTTP GET
• cURL
• HTTP GET+
• cURL + various common parameters e.g., user agent, cookies
• HTTP GET
• Chrome
• 10,000 DOIs, randomly picked, 100 DOIs from the 100 most
frequent publisher domains
• HTTP requests sent from AWS VM and LANL network
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP clients, request methods, dataset, networks
• HTTP HEAD
• cURL
• HTTP GET
• cURL
• HTTP GET+
• cURL + various common parameters e.g., user agent, cookies
• HTTP GET
• Chrome
• 10,000 DOIs, randomly picked, 100 DOIs from the 100 most
frequent publisher domains
• HTTP requests sent from AWS VM and LANL network
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err10,000DOIs
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
• 13% 400-level
responses w/ HEAD
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
• 13% 400-level
responses w/ HEAD
• 25% of them w/
200-level response
w/ any other method
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
https://arxiv.org/abs/2004.03011
For more background, details, results
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
On the Persistence of Persistent
Identifiers of the Scholarly Web
Thank you
&
stay safe!
Martin Klein & Lyudmila Balakireva
Los Alamos National Laboratory
{mklein, ludab}@lanl.gov

Mais conteúdo relacionado

Mais procurados

Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
Juan Sequeda
 
Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked Data
Leigh Dodds
 
How to become an effective web searcher
How to become an effective web searcherHow to become an effective web searcher
How to become an effective web searcher
rangak
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
sawarkar17
 
Location, location, location: A transaction comparison of catalog searches o...
Location, location, location:A transaction comparison of catalog searches o...Location, location, location:A transaction comparison of catalog searches o...
Location, location, location: A transaction comparison of catalog searches o...
teaguese
 

Mais procurados (20)

cited by how-to
cited by how-tocited by how-to
cited by how-to
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and services
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked Data
 
Semantic Web Applications
Semantic Web ApplicationsSemantic Web Applications
Semantic Web Applications
 
How to become an effective web searcher
How to become an effective web searcherHow to become an effective web searcher
How to become an effective web searcher
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
 
Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)
 
Location, location, location: A transaction comparison of catalog searches o...
Location, location, location:A transaction comparison of catalog searches o...Location, location, location:A transaction comparison of catalog searches o...
Location, location, location: A transaction comparison of catalog searches o...
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
 
1018telling story from text 2
1018telling story from text 21018telling story from text 2
1018telling story from text 2
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815
 

Semelhante a On the Persistence of Persistent Identifiers of the Scholarly Web

DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
Georgi Kobilarov
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
Robert Meusel
 

Semelhante a On the Persistence of Persistent Identifiers of the Scholarly Web (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Linked Data
Linked DataLinked Data
Linked Data
 
Webofdata
WebofdataWebofdata
Webofdata
 
Barcamprdu linkeddata
Barcamprdu linkeddataBarcamprdu linkeddata
Barcamprdu linkeddata
 
Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Site Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look ForSite Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look For
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Transmission6 - Publishing Linked Data
Transmission6 - Publishing Linked DataTransmission6 - Publishing Linked Data
Transmission6 - Publishing Linked Data
 
The Power of Open Data
The Power of Open DataThe Power of Open Data
The Power of Open Data
 
Rest web services
Rest web servicesRest web services
Rest web services
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Web Hacking Series Part 1
Web Hacking Series Part 1Web Hacking Series Part 1
Web Hacking Series Part 1
 
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianAPI Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
 

Mais de Martin Klein

Mais de Martin Klein (20)

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 

Último

@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
Obat Cytotec
 
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
ayvbos
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样
AS
 
一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理
SS
 
一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理
A
 
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
hfkmxufye
 
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
AS
 
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
AS
 
一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书
A
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理
F
 
一比一原版罗切斯特大学毕业证如何办理
一比一原版罗切斯特大学毕业证如何办理一比一原版罗切斯特大学毕业证如何办理
一比一原版罗切斯特大学毕业证如何办理
F
 
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
apekaom
 
一比一原版贝德福特大学毕业证学位证书
一比一原版贝德福特大学毕业证学位证书一比一原版贝德福特大学毕业证学位证书
一比一原版贝德福特大学毕业证学位证书
F
 
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
mikehavy0
 

Último (20)

@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
@OBAT ABORSI 3 BULAN@ OBAT PENGGUGUR KANDUNGAN 3 BULAN (087776558899)
 
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样
 
一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理
 
一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理
 
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
 
Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303
 
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
 
The Rise of Subscription-Based Digital Services.pdf
The Rise of Subscription-Based Digital Services.pdfThe Rise of Subscription-Based Digital Services.pdf
The Rise of Subscription-Based Digital Services.pdf
 
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
一比一原版(毕业证书)新西兰怀特克利夫艺术设计学院毕业证原件一模一样
 
一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书
 
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
TOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide HandbookTOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide Handbook
 
一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理
 
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic ManagementBeyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Management
 
一比一原版罗切斯特大学毕业证如何办理
一比一原版罗切斯特大学毕业证如何办理一比一原版罗切斯特大学毕业证如何办理
一比一原版罗切斯特大学毕业证如何办理
 
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
 
一比一原版贝德福特大学毕业证学位证书
一比一原版贝德福特大学毕业证学位证书一比一原版贝德福特大学毕业证学位证书
一比一原版贝德福特大学毕业证学位证书
 
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
 

On the Persistence of Persistent Identifiers of the Scholarly Web

  • 1. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Martin Klein & Lyudmila Balakireva Los Alamos National Laboratory {mklein, ludab}@lanl.gov On the Persistence of Persistent Identifiers of the Scholarly Web HEAD GET GET+ Chrome https://arxiv.org/abs/2004.03011
  • 2. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 3. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 4. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 5. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 How does this work via HTTP? https://doi.org/10.1007/978-3-540-87599-4_38
  • 6. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Arrived at landing page https://doi.org/10.1007/978-3-540-87599-4_38 https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
  • 7. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP redirects https://doi.org/10.1007/978-3-540-87599-4_38  (HTTP 302 redirect) http://link.springer.com/10.1007/978-3-540-87599-4_38  (HTTP 301 redirect) https://link.springer.com/10.1007/978-3-540-87599-4_38  (HTTP 302 redirect) https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38  (HTTP 200)
  • 8. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Questions… • How persistent is this DOI resolution? • Given different clients and network environments: • Can we consistently arrive at the same location at the end of the redirect chain? • Is the path there (redirect chain) the same? • Are there differences between Open Access and non-OA? • Subscription vs non-Subscription level content? • Do scholarly content providers differ from the popular web?
  • 9. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Idea… • Comparative study investigating scholarly publishers’ responses • To common HTTP requests • Against DOIs • Using different web clients and request methods, resembling • Machines ”browsing”, crawling • Humans browsing • From network environments with different subscriptions/licenses • Amazon Web Service EC2 instance • LANL internal • Compare against web servers providing popular web content
  • 10. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP clients, request methods, dataset, networks • HTTP HEAD • cURL • HTTP GET • cURL • HTTP GET+ • cURL + various common parameters e.g., user agent, cookies • HTTP GET • Chrome • 10,000 DOIs, randomly picked, 100 DOIs from the 100 most frequent publisher domains • HTTP requests sent from AWS VM and LANL network
  • 11. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP clients, request methods, dataset, networks • HTTP HEAD • cURL • HTTP GET • cURL • HTTP GET+ • cURL + various common parameters e.g., user agent, cookies • HTTP GET • Chrome • 10,000 DOIs, randomly picked, 100 DOIs from the 100 most frequent publisher domains • HTTP requests sent from AWS VM and LANL network
  • 12. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err10,000DOIs
  • 13. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods
  • 14. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET
  • 15. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome
  • 16. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome • 13% 400-level responses w/ HEAD
  • 17. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome • 13% 400-level responses w/ HEAD • 25% of them w/ 200-level response w/ any other method
  • 18. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 https://arxiv.org/abs/2004.03011 For more background, details, results
  • 19. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 On the Persistence of Persistent Identifiers of the Scholarly Web Thank you & stay safe! Martin Klein & Lyudmila Balakireva Los Alamos National Laboratory {mklein, ludab}@lanl.gov

Notas do Editor

  1. Hello and welcome to this session! My name is Martin Klein and I work in the RL @ LANL. I’d like to give brief overview of the work done with my colleague Luda Balakireva on the persistence of persistent identifiers of the scholarly web. More specifically, we are testing Digital Object Identifiers (DOIs) and how consistently or inconsistently scholarly publishers respond when DOIs are requested. It is worth noting that several different persistent identifiers are used on the scholarly web but for the purpose of this study, we only investigate DOIs.
  2. Why do we do that? Well, the answer is pretty simple: because DOIs are very common. For example, traditional journal or conference proceeding papers are often assigned DOIs as shown in this example from the IEEE.
  3. The same holds true for datasets that are often assigned DOIs as shown here in Zenodo.
  4. Or more generally speaking, scholarly projects that can include multiple resources and types of resources, as shown here in the example of the Open Science Framework, are assigned DOIs. So this all is to say that DOIs are very frequently used to identify scholarly resources on the web.
  5. So how does this work, how are DOIs resolved on the web? If we take this DOIs that is actionable via HTTP
  6. And use a web browser to dereference it, the browser will eventually display the resource, in this case the landing page of a scholarly article, identified by the DOI. Note that the URI of the landing page, shown on the bottom of this slide, is different from the DOI, as it is hosted by Springer.
  7. The reason for this is that in the background, somewhat opaque to the user, the browser follows a number of HTTP redirects from the DOI to the landing page URI. The redirect chain for our example DOI is shown here: We first see a HTTP 302 redirect to Springer Followed by a 301 redirect to the HTTPS protocol And another 302 to the landing page URI. The landing page, as the last link of the redirect chain, returns an HTTP 200 response code, indicating success of the request and the server’s response.
  8. So the main question we are investigating with our work is: how persistent is this DOI resolution? Given that DOIs can be requested by different HTTP clients and from different network environments, several subsequent questions arise. For example: Can we consistently arrive at the same last link of a redirect chain? Does the chain itself change? Is there a difference between the resolution of DOIs that identify OA resources vs those that identify non-OA resources? Does it matter if the request against a DOI comes from within an institutional network with certain subscription levels to commercial publishers? If we observe such differences, is this typical only for the scholarly web or are these behaviors reflected in the popular web as well? In short, our intention is to test the consistency of DOI responses. Afterall, without consistency, how can we trust the persistence of such identifiers and their underlying infrastructure?
  9. We designed a study to investigate scholarly publishers and their responses to requests against DOIs. We use common HTTP clients and methods that resemble both machine and human browsing behavior. We send our request from 2 different network environments with different subscription levels to commercial publishers. We send the same requests against web servers providing popular web content to compare our results.
  10. We use the here summarized 4 different HTTP methods and clients for our experiment. We send HTTP HEAD requests with the popular command line tool cURL. We send simple HTTP GET requests, also with cURL. We send more complex HTTP GET requests with cURL, where we for example specify a user agent and accept cookies. Lastly, we use the popular web browser Chrome to send HTTP GET requests. We send these 4 requests against a corpus of 10k randomly sampled DOIs and repeat the experiment from 2 different network environments a VM in the Amazon Cloud and from within the LANL network.
  11. We make the case that the first 3 methods resemble a machine browsing or crawling the web. Mostly because cURL is a tool that humans typically only use for testing but it is a tool that is frequently utilized in scripts that access web resources at scale. In contrast, the Chrome method, somewhat naturally, most closely resembles a human browsing.
  12. Due to time constraints I will only show one set of results. What we see here in this graph is the response code of the last link of all redirect chains, distinguished by request method. Our 4 methods to dereference DOIs are shown on the x-axis 10k DOIs are displayed on the y-axis Response codes are binned at the hundreds level, where green indicates 200-level response (success), gray represents 300-level responses (redirect), red – 400 (server error), blue – 500 (client error) This graph shows results of requests sent from a VM in the Amazon Cloud, so a network presumably w/o subscriptions to commercial publishers. A number of observations can immediately be made:
  13. 1) - Less than 50% of DOIs consistently return a 200-level response, meaning success, across all 4 request methods. - In other words, more than 5k of our DOIs did not respond consistently across all 4 methods! A rather astonishing ratio! - Looking at the individual methods, we can note that Chrome, the method most closely resembling a human browsing the web, performs best
  14. 2) Next, we recognize that the simple GET method seems not well-suited for resolving DOIs With more than 40% of DOI chains ending in a 300-level response. This is noteworthy as, by definition, 300-level should not be a *final* response code of a redirect chain on the web - No obvious reason why….
  15. …especially given that a large fraction of those DOIs, 25% in total, result in a successful response with the HEAD or Chrome method used.
  16. 4) Our next observation is that a significant portion – 13% - of DOI requests with the simple HEAD method result in a 400-level response. One could think there are a lot of 403s meaning access forbidden or 405 meaning the HEAD method is not allowed against the resource But that is not the case, this portion is indeed dominated by 404s meaning resource not found
  17. Oddly, 25% of these DOIs result in a 200-level response when any other request method is used. So, do they exist or not? While such scenarios of changing response codes are not well-aligned with HTTP standards and best practice on the web, our observations strongly indicate that scholarly publishers do respond differently to requests against the same DOI, depending on what method is used. In addition, we can clearly see patterns where responses are different for methods that resemble machine vs human behavior. This is represented by the success of the Chrome method and the lack of success in particular by the simple GET and HEAD method. In aggregate, from our point of view, these observed inconsistencies raise more questions and do not increase trust in the persistence of persistent identifiers.
  18. For more results, details on the methodology and dataset used, we refer to the paper. The corresponding pre-print is available at the displayed URI on the bottom of this slide.
  19. This concludes my short presentation. Thanks a lot for watching! I am happy to hear your feedback and discuss our work. Thank you!