SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Avoiding Zombies in
Archival Replay Using
ServiceWorker
Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University, Norfolk, VA, 23529
@ibnesayeed
@WebSciDL
Supported in part by NSF III 1526700
1
WADL 2017, June 22-23, 2017, Toronto, Ontario, Canada
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2017
2
● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
?
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2012
3
● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Sawood Alam <@ibnesayeed>
XenLand @ Alpha Centauri
4
Sawood Alam <@ibnesayeed>
Zombies in Archive
5
?
Sawood Alam <@ibnesayeed>
Zombies in Archive
6
<img src="http://xenland.alpha/images/map.png">
// Is rewritten on replay to become:
<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">
// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:
var base = 'http://xenland.alpha';
var imgdir = '/images/';
var img = document.createElement('img');
img.src = base + imgdir + 'ruler.png';
document.getElementById('ruler').appendChild(img);
//=>> http://xenland.alpha/images/ruler.png
Sawood Alam <@ibnesayeed>
Replay URL Resolution & Rewriting
7
Reference type Example Resolution after relocation
Relative path images/logo.png Potentially correct
Absolute path /public/images/logo.png Potentially incorrect
Absolute URL http://example.com/public/images/logo.png Potentially live leakage
http://example.com/public/index.html
...
<img src="/public/images/logo.png">
...
http://archive.example.org/<datetime>/http://example.com/public/index.html
...
<img src="/<datetime>/http://example.com/public/images/logo.png">
...
Sawood Alam <@ibnesayeed>
Avoiding Zombies
● Ahead-of-time rendering and JS execution
○ http://archive.is/
● Archival replay proxy
○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage
● Browser extension
○ MementoFox (deprecated)
● JS override
○ wombat.js in PyWB
● ServiceWorker
8
Sawood Alam <@ibnesayeed>
● New web API (still a working draft)
● A standalone JavaScript file
● Persists in the browser independent of the window
● Acts as a proxy
● Installed by a web page under its domain at a specific path (called scope)
● Intercepts all requests in scope
○ Resources under the scope path (at any depth)
○ Secondary resource requests originated from any resource under scope
● Allows modification in request and response
● Primarily used in web applications for offline access and notification support
● Requires HTTPS
● Growing browser support (73.61% as of June 8, 2017)
ServiceWorker
9
● http://caniuse.com/#feat=serviceworkers
Sawood Alam <@ibnesayeed>
reconstructive.js
10
● https://github.com/oduwsdl/reconstructive
● A ServiceWorker script written for archival replay
● Plug-in for web archives or Memento aggregators
● Intercepts all network requests originated from a memento
● Reroutes requests to an archive (prevents live leakage & incorrect references)
● Optionally rewrites the content to add banner & to fix hyperlinks
Sawood Alam <@ibnesayeed>
Zombies, No More!
11
● https://github.com/oduwsdl/ipwb
Sawood Alam <@ibnesayeed>
Rewriting Mementos is Expensive
12
Original capture (without any rewriting)
In our experiment over 500 home pages we observed:
● One-fifth mean data overhead
● One-third mean time overhead
15% more data in twice the time
Sawood Alam <@ibnesayeed>
Archival Capture Replay Test Suite (ACRTS)
13
reconstructive.js
● https://ibnesayeed.github.io/acrts/
Sawood Alam <@ibnesayeed>
Reconstruction Winners: PyWB & reconstructive.js
A. OpenWayback
B. PyWB
C. Memento
Reconstruct
D. Memento for
Chrome
E. reconstructive.js
14
Sawood Alam <@ibnesayeed>
Future Work
● Use “Prefer” header for original content (when archives support it)
● Add a customizable archival banner
● Add click handler for lazy rewriting of hyperlinks
● Handle archived ServiceWorkers
● Write a 404-combat ServiceWorker script for webmasters
15
● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
Sawood Alam <@ibnesayeed>
● reconstructive.js => no zombies!
● Rerouting instead of rewriting (lazy rewriting)
● Mean overhead reduction
○ one-fifth data
○ one-third time
● 73.61% (and growing) browser support for ServiceWorker
○ http://caniuse.com/#feat=serviceworkers
● reconstructive.js
○ https://github.com/oduwsdl/reconstructive
● Archival Capture Replay Test Suite
○ https://ibnesayeed.github.io/acrts/
Conclusions
16

Mais conteúdo relacionado

Mais procurados

Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
librarywebchic
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State Senate
Acquia
 

Mais procurados (20)

Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Answers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataAnswers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked Data
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State Senate
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
IBM Connections REST-API Waltz
IBM Connections REST-API WaltzIBM Connections REST-API Waltz
IBM Connections REST-API Waltz
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API Klompendans
 
Reference Rot and Link Decoration
Reference Rot and Link DecorationReference Rot and Link Decoration
Reference Rot and Link Decoration
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
 

Semelhante a Avoiding Zombies in Archival Replay Using ServiceWorker

Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Kritika Garg
 
JohnNicoResume
JohnNicoResumeJohnNicoResume
JohnNicoResume
John Nico
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
Mat Kelly
 

Semelhante a Avoiding Zombies in Archival Replay Using ServiceWorker (20)

Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
 
JohnNicoResume
JohnNicoResumeJohnNicoResume
JohnNicoResume
 
Optimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile UsersOptimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile Users
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Notes on SF W3Conf
Notes on SF W3ConfNotes on SF W3Conf
Notes on SF W3Conf
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress Hosting
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
 
20 tips for website performance
20 tips for website performance20 tips for website performance
20 tips for website performance
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
 
How to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's PublishersHow to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's Publishers
 
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
 
Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016
 
muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"
 

Mais de Sawood Alam

Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
Sawood Alam
 

Mais de Sawood Alam (20)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web Packaging
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web Archives
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Último (20)

STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 

Avoiding Zombies in Archival Replay Using ServiceWorker

  • 1. Avoiding Zombies in Archival Replay Using ServiceWorker Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson Web Science and Digital Libraries Research Group Old Dominion University, Norfolk, VA, 23529 @ibnesayeed @WebSciDL Supported in part by NSF III 1526700 1 WADL 2017, June 22-23, 2017, Toronto, Ontario, Canada
  • 2. Sawood Alam <@ibnesayeed> 2008 Memento Seen in 2017 2 ● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html ?
  • 3. Sawood Alam <@ibnesayeed> 2008 Memento Seen in 2012 3 ● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
  • 6. Sawood Alam <@ibnesayeed> Zombies in Archive 6 <img src="http://xenland.alpha/images/map.png"> // Is rewritten on replay to become: <img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png"> // URLs constructed by JavaScript are harder to rewrite on replay, e.g.: var base = 'http://xenland.alpha'; var imgdir = '/images/'; var img = document.createElement('img'); img.src = base + imgdir + 'ruler.png'; document.getElementById('ruler').appendChild(img); //=>> http://xenland.alpha/images/ruler.png
  • 7. Sawood Alam <@ibnesayeed> Replay URL Resolution & Rewriting 7 Reference type Example Resolution after relocation Relative path images/logo.png Potentially correct Absolute path /public/images/logo.png Potentially incorrect Absolute URL http://example.com/public/images/logo.png Potentially live leakage http://example.com/public/index.html ... <img src="/public/images/logo.png"> ... http://archive.example.org/<datetime>/http://example.com/public/index.html ... <img src="/<datetime>/http://example.com/public/images/logo.png"> ...
  • 8. Sawood Alam <@ibnesayeed> Avoiding Zombies ● Ahead-of-time rendering and JS execution ○ http://archive.is/ ● Archival replay proxy ○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage ● Browser extension ○ MementoFox (deprecated) ● JS override ○ wombat.js in PyWB ● ServiceWorker 8
  • 9. Sawood Alam <@ibnesayeed> ● New web API (still a working draft) ● A standalone JavaScript file ● Persists in the browser independent of the window ● Acts as a proxy ● Installed by a web page under its domain at a specific path (called scope) ● Intercepts all requests in scope ○ Resources under the scope path (at any depth) ○ Secondary resource requests originated from any resource under scope ● Allows modification in request and response ● Primarily used in web applications for offline access and notification support ● Requires HTTPS ● Growing browser support (73.61% as of June 8, 2017) ServiceWorker 9 ● http://caniuse.com/#feat=serviceworkers
  • 10. Sawood Alam <@ibnesayeed> reconstructive.js 10 ● https://github.com/oduwsdl/reconstructive ● A ServiceWorker script written for archival replay ● Plug-in for web archives or Memento aggregators ● Intercepts all network requests originated from a memento ● Reroutes requests to an archive (prevents live leakage & incorrect references) ● Optionally rewrites the content to add banner & to fix hyperlinks
  • 11. Sawood Alam <@ibnesayeed> Zombies, No More! 11 ● https://github.com/oduwsdl/ipwb
  • 12. Sawood Alam <@ibnesayeed> Rewriting Mementos is Expensive 12 Original capture (without any rewriting) In our experiment over 500 home pages we observed: ● One-fifth mean data overhead ● One-third mean time overhead 15% more data in twice the time
  • 13. Sawood Alam <@ibnesayeed> Archival Capture Replay Test Suite (ACRTS) 13 reconstructive.js ● https://ibnesayeed.github.io/acrts/
  • 14. Sawood Alam <@ibnesayeed> Reconstruction Winners: PyWB & reconstructive.js A. OpenWayback B. PyWB C. Memento Reconstruct D. Memento for Chrome E. reconstructive.js 14
  • 15. Sawood Alam <@ibnesayeed> Future Work ● Use “Prefer” header for original content (when archives support it) ● Add a customizable archival banner ● Add click handler for lazy rewriting of hyperlinks ● Handle archived ServiceWorkers ● Write a 404-combat ServiceWorker script for webmasters 15 ● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
  • 16. Sawood Alam <@ibnesayeed> ● reconstructive.js => no zombies! ● Rerouting instead of rewriting (lazy rewriting) ● Mean overhead reduction ○ one-fifth data ○ one-third time ● 73.61% (and growing) browser support for ServiceWorker ○ http://caniuse.com/#feat=serviceworkers ● reconstructive.js ○ https://github.com/oduwsdl/reconstructive ● Archival Capture Replay Test Suite ○ https://ibnesayeed.github.io/acrts/ Conclusions 16