SlideShare a Scribd company logo
1 of 42
Intro to Web Archiving
Dr. Michele C. Weigle, @weiglemc
Web Sciences and Digital Libraries (WS-DL) Group, @WebSciDL
Department of Computer Science
Old Dominion University
June 26, 2018
ODU Machine Learning and Data Sciences Camp
@weiglemc, @WebSciDL
ODU WS-DL Group
ā€¢ Web Sciences and Digital Libraries
ā€“ digital preservation
ā€“ web archiving
ā€“ web science (social media analysis, web usage analysis)
ā€¢ Our recent work has been featured in the popular
press
June 26, 2018 2
@WebSciDL
http://ws-dl.cs.odu.edu/
http://ws-dl.blogspot.com/
@weiglemc, @WebSciDL
ODU WS-DL Group
ā€¢ Scott Ainsworth
ā€¢ Sawood Alam
ā€¢ Lulwah Alkwai
ā€¢ Mohamed Aturban
ā€¢ Brian Griffin
ā€¢ Hussam Hallak
ā€¢ Shawn Jones
ā€¢ Mat Kelly
ā€¢ Corren McCoy
ā€¢ Louis Nguyen
ā€¢ Alexander Nwala
June 26, 2018 3
PhD Students
ā€¢ Nauman Siddique
ā€¢ Miranda Smith
MS Students
Coming in Fall 2018!
ā€¢ Dr. Sampath Jayarathna
ā€¢ Dr. Jian Wu
ā€¢ Dr. Michael L. Nelson
ā€¢ Dr. Michele C. Weigle
Faculty
@WebSciDL
http://ws-dl.cs.odu.edu/
http://ws-dl.blogspot.com/
@weiglemc, @WebSciDL
What is the past web?
June 26, 2018 4
@weiglemc, @WebSciDL
The Web holds our stories
June 26, 2018 5
@weiglemc, @WebSciDL
But webpages can disappear
ā€¢ Average lifespan of a webpage: 50-100 days
ā€¢ A year after publication, about 11% of
content shared on social media will be gone.
June 26, 2018
SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
6
@weiglemc, @WebSciDL
Maybe it's archived?
June 26, 2018 7
https://archive.org/web
@weiglemc, @WebSciDL
Why archives matter
ā€¢ Malaysia Airlines Flight
17 (MH17)
ā€¢ Ukrainian separatists
originally took credit for
downing a transport
plane in that location
ā€¢ Later deleted the post
ā€¢ Internet Archive had
archived the post before
deletion
June 26, 2018 8
http://www.csmonitor.com/World/Europe/2014/0717/Web-
evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
@weiglemc, @WebSciDL
We can use archives to tell stories
June 26, 2018 9
similar to our Hurricane Katrina example: https://www.slideshare.net/phonedude/why-careaboutthepast
https://www.nytimes.com/2016/11/17/insider/in-13-
headlines-the-drama-of-election-night.html
@weiglemc, @WebSciDL
If something's gone from the live
web, check a web archive
June 26, 2018 10
@weiglemc, @WebSciDL
Web archives to the rescue!
June 26, 2018 11
https://twitter.com/brian3354/status/966081774194511874
@weiglemc, @WebSciDL
Internet Archive's Wayback Machine
has gone mainstream
June 26, 2018 12
"God bless you Internet Archive"
- Rachel Maddow, Dec 12, 2016
Last Week Tonight, Mar 18, 2018
Jill Lepore, "The Cobweb", The New Yorker, Jan 26, 2015
@weiglemc, @WebSciDL
But Wayback is not Google
ā€¢ Wayback Machine has no full-text search
ā€“ too big to be indexed
ā€“ 654 billion web pages, 9 petabytes of data
ā€“ growing at 20 TB/week
ā€¢ Enter URL and pick a date
June 26, 2018 13
"Itā€™s more like a phone book than like an archive."
-Jill Lepore, The New Yorker
@weiglemc, @WebSciDL
What do people think the Wayback
Machine is?
June 26, 2018 14
https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
@weiglemc, @WebSciDL
What do people think the Wayback
Machine is?
June 26, 2018 15
https://www.cnn.com/2018/02/16/politics/richard-pinedo-guilty-plea/index.html
https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
https://web.archive.org/web/20180115103952/https:/auctionessistance.com/
@weiglemc, @WebSciDL
Caches are not archives
June 26, 2018 16
http://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html
http://www.wired.co.uk/article/russia-propaganda-online-blog-longform-medium-posts
https://webcache.googleusercontent.com/search?q=cache:qwqnGPqC2vsJ:https://medium.com/
%40TheFoundingSon/huffington-post-vs-whiteness-and-white-women-
1e67193085d4+&cd=15&hl=en&ct=clnk&gl=uk
@weiglemc, @WebSciDL
Is it really that important to archive
instead of just taking a screenshot?
June 26, 2018 17
https://twitter.com/AngryBlackLady/status/990032514080108544
https://twitter.com/phonedude_mln/status/990070331737100288
@weiglemc, @WebSciDL
We should be doing both
June 26, 2018 18
https://twitter.com/conspirator0/status/1000475042017366017
@weiglemc, @WebSciDL
ā€œIf you see something, save
somethingā€
June 26, 2018 19
https://blog.archive.org/2017/01/25/see-something-save-something/
@weiglemc, @WebSciDL
There's more than just the Internet
Archive
June 26, 2018 20
http://timetravel.mementoweb.org/list/20020908180610/http://blog.reidreport.com/
@weiglemc, @WebSciDL
TimeTravel
June 26, 2018 21
http://timetravel.mementoweb.org
@weiglemc, @WebSciDL
Pro tip: submit pages to multiple
archives
June 26, 2018 22
https://twitter.com/phonedude_mln/status/998948823845261312
@weiglemc, @WebSciDL
We've built tools to help people
submit webpages to multiple archives
ā€¢ Mink ā€“ Google Chrome extension
ā€¢ #icanhazmemento ā€“ Twitter bot
ā€¢ ArchiveNow ā€“ Python module, Docker
container, local web service
June 26, 2018 23
@weiglemc, @WebSciDL
Mink
June 26, 2018 24
ā€œArchive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcherā€,
2014-2017, HK-50181-14
Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing
Experience Using Web Browsers and Memento," JCDL 2014, poster.
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Google Chrome extension
Submit currently viewed
webpage to public
archives
https://github.com/machawk1/
Mink
@weiglemc, @WebSciDL
#icanhazmemento
June 26, 2018 25
http://ws-dl.blogspot.com/2015/07/2015-07-22-i-can-haz-memento.html
Twitter bot
Include #icanhazmemento in a
tweet with a URL
Bot replies with a link to the
memento of the page closest to
the time of the tweet
If page not archived, bot submits
URL to multiple public archives,
replies with a link to the
memento in Time Travel
Alexander Nwala, "2015-07-22: I Can Haz Memento,"
https://github.com/anwala/icanhazmemento
@weiglemc, @WebSciDL
ArchiveNow
June 26, 2018 26
Mohamed Aturban, Mat Kelly, Sawood Alam, John Berlin, Michael L. Nelson and Michele C. Weigle,
"ArchiveNow: Simplified, Extensible, Multi-Archive Preservation," JCDL 2018, poster.
http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
Python module, Docker
container
Submit URI to multiple
archives
ā€œTowards a Web-Centric Approach for Capturing the Scholarly Recordā€, 2016-2019
https://github.com/oduwsdl/archivenow
@weiglemc, @WebSciDL
Memento: Time Travel for the Web
Access mementos in
multiple web archives
Mementoā€™s core
components:
ā€¢ A bridge between
present and past: link
and content
negotiation
ā€¢ A bridge between past
and present: link
June 26, 2018 27
@weiglemc, @WebSciDL
Memento Aggregator
June 26, 2018 28
@weiglemc, @WebSciDL
Memento Aggregator
June 26, 2018 29
@weiglemc, @WebSciDL
How can I use Memento?
June 26, 2018
Memento for Chrome
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
http://timetravel.mementoweb.org
30
Mink
@weiglemc, @WebSciDL
Use Mink to view the odu.edu of the
past
June 26, 2018 31
@weiglemc, @WebSciDL
Click the Mink icon
June 26, 2018 32
@weiglemc, @WebSciDL
Then choose your datetime
June 26, 2018 33
@weiglemc, @WebSciDL
Archived odu.edu
June 26, 2018 34
@weiglemc, @WebSciDL
Fixing 404 Pages: Google Results Page
June 26, 2018 35
@weiglemc, @WebSciDL
Fixing 404 Pages: Result Page
June 26, 2018 36
http://www.clashmusic.com/news/johnny-marr-leaves-the-cribs
@weiglemc, @WebSciDL
Fixing 404 Pages: Scrolling Down
June 26, 2018 37
@weiglemc, @WebSciDL
Fixing 404 Pages: Server Up, Page 404
June 26, 2018 38
@weiglemc, @WebSciDL
Fixing 404 Pages: Using Mink
June 26, 2018 39
@weiglemc, @WebSciDL
Fixing 404 Pages: Archived Page 2011-
04-16
June 26, 2018 40
@weiglemc, @WebSciDL
#whatdiditlooklike
June 26, 2018 41
http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html
Twitter bot
Include #whatdiditlooklike in a
tweet with a URL
Bot generates animated GIF of first
memento of each year
Bot replies with a link to entry in
Tumblr
Tumblr:
http://whatdiditlooklike.mementoweb.org/
Source:
https://github.com/anwala/wdill
Alexander Nwala, "2015-02-05: What Did It Look Like?,"
@weiglemc, @WebSciDL
Use web archives to save the current
web and view the past web
ā€¢ Web Science and Digital Libraries (WS-DL) group at
ODU
ā€“ ws-dl.blogspot.com, @WebSciDL (Twitter)
ā€¢ Websites/Tools for web archiving
ā€“ Internet Archive's Wayback Machine - archive.org/web
ā€“ On-demand archiving - archive.is
ā€“ Memento Time Travel - timetravel.mementoweb.org
ā€“ Mink - matkelly.com/mink/
ā€“ #icanhazmemento
ā€“ #whatdiditlooklike
June 26, 2018 42

More Related Content

Similar to Intro to Web Archiving

Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
Ā 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
Ā 

Similar to Intro to Web Archiving (20)

It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
Ā 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Ā 
Evolving the Web into a Global Dataspace ā€“ Advances and Applications
Evolving the Web into a Global Dataspace ā€“ Advances and ApplicationsEvolving the Web into a Global Dataspace ā€“ Advances and Applications
Evolving the Web into a Global Dataspace ā€“ Advances and Applications
Ā 
How Social Media Changed Web Design
How Social Media Changed Web DesignHow Social Media Changed Web Design
How Social Media Changed Web Design
Ā 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
Ā 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
Ā 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Ā 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Ā 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
Ā 
Bot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotBot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with Pywikibot
Ā 
Web Tools & Mobile Apps for Teaching and Learning Mathematics (2018)
Web Tools & Mobile Apps for Teaching and Learning Mathematics (2018)Web Tools & Mobile Apps for Teaching and Learning Mathematics (2018)
Web Tools & Mobile Apps for Teaching and Learning Mathematics (2018)
Ā 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
Ā 
Preserving the web
Preserving the webPreserving the web
Preserving the web
Ā 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
Ā 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
Ā 
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Ā 
Sources
SourcesSources
Sources
Ā 
Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)
Ā 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
Ā 
Measuring News Similarity Across Ten U.S. News Sites
Measuring News Similarity Across Ten U.S. News SitesMeasuring News Similarity Across Ten U.S. News Sites
Measuring News Similarity Across Ten U.S. News Sites
Ā 

More from Michele Weigle

More from Michele Weigle (20)

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Ā 
Visualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeVisualizing Webpage Changes Over Time
Visualizing Webpage Changes Over Time
Ā 
How to Write an Academic Paper
How to Write an Academic PaperHow to Write an Academic Paper
How to Write an Academic Paper
Ā 
How to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationHow to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic Presentation
Ā 
My Academic Story via Internet Archive
My Academic Story via Internet ArchiveMy Academic Story via Internet Archive
My Academic Story via Internet Archive
Ā 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor Networks
Ā 
Strategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseStrategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency Response
Ā 
Energy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksEnergy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless Nanonetworks
Ā 
2015-capwic-gradschool
2015-capwic-gradschool2015-capwic-gradschool
2015-capwic-gradschool
Ā 
2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-web2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-web
Ā 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past Web
Ā 
Archive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH OverviewArchive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH Overview
Ā 
Bits of Research
Bits of ResearchBits of Research
Bits of Research
Ā 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
Ā 
"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview
Ā 
TDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsTDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETs
Ā 
Visualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItVisualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-It
Ā 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
Ā 
Communications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksCommunications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor Networks
Ā 
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksA Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
Ā 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(ā˜Žļø+971_581248768%)**%*]'#abortion pills for sale in dubai@
Ā 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Ā 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Ā 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Ā 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Ā 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 

Intro to Web Archiving

  • 1. Intro to Web Archiving Dr. Michele C. Weigle, @weiglemc Web Sciences and Digital Libraries (WS-DL) Group, @WebSciDL Department of Computer Science Old Dominion University June 26, 2018 ODU Machine Learning and Data Sciences Camp
  • 2. @weiglemc, @WebSciDL ODU WS-DL Group ā€¢ Web Sciences and Digital Libraries ā€“ digital preservation ā€“ web archiving ā€“ web science (social media analysis, web usage analysis) ā€¢ Our recent work has been featured in the popular press June 26, 2018 2 @WebSciDL http://ws-dl.cs.odu.edu/ http://ws-dl.blogspot.com/
  • 3. @weiglemc, @WebSciDL ODU WS-DL Group ā€¢ Scott Ainsworth ā€¢ Sawood Alam ā€¢ Lulwah Alkwai ā€¢ Mohamed Aturban ā€¢ Brian Griffin ā€¢ Hussam Hallak ā€¢ Shawn Jones ā€¢ Mat Kelly ā€¢ Corren McCoy ā€¢ Louis Nguyen ā€¢ Alexander Nwala June 26, 2018 3 PhD Students ā€¢ Nauman Siddique ā€¢ Miranda Smith MS Students Coming in Fall 2018! ā€¢ Dr. Sampath Jayarathna ā€¢ Dr. Jian Wu ā€¢ Dr. Michael L. Nelson ā€¢ Dr. Michele C. Weigle Faculty @WebSciDL http://ws-dl.cs.odu.edu/ http://ws-dl.blogspot.com/
  • 4. @weiglemc, @WebSciDL What is the past web? June 26, 2018 4
  • 5. @weiglemc, @WebSciDL The Web holds our stories June 26, 2018 5
  • 6. @weiglemc, @WebSciDL But webpages can disappear ā€¢ Average lifespan of a webpage: 50-100 days ā€¢ A year after publication, about 11% of content shared on social media will be gone. June 26, 2018 SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012 http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html 6
  • 7. @weiglemc, @WebSciDL Maybe it's archived? June 26, 2018 7 https://archive.org/web
  • 8. @weiglemc, @WebSciDL Why archives matter ā€¢ Malaysia Airlines Flight 17 (MH17) ā€¢ Ukrainian separatists originally took credit for downing a transport plane in that location ā€¢ Later deleted the post ā€¢ Internet Archive had archived the post before deletion June 26, 2018 8 http://www.csmonitor.com/World/Europe/2014/0717/Web- evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
  • 9. @weiglemc, @WebSciDL We can use archives to tell stories June 26, 2018 9 similar to our Hurricane Katrina example: https://www.slideshare.net/phonedude/why-careaboutthepast https://www.nytimes.com/2016/11/17/insider/in-13- headlines-the-drama-of-election-night.html
  • 10. @weiglemc, @WebSciDL If something's gone from the live web, check a web archive June 26, 2018 10
  • 11. @weiglemc, @WebSciDL Web archives to the rescue! June 26, 2018 11 https://twitter.com/brian3354/status/966081774194511874
  • 12. @weiglemc, @WebSciDL Internet Archive's Wayback Machine has gone mainstream June 26, 2018 12 "God bless you Internet Archive" - Rachel Maddow, Dec 12, 2016 Last Week Tonight, Mar 18, 2018 Jill Lepore, "The Cobweb", The New Yorker, Jan 26, 2015
  • 13. @weiglemc, @WebSciDL But Wayback is not Google ā€¢ Wayback Machine has no full-text search ā€“ too big to be indexed ā€“ 654 billion web pages, 9 petabytes of data ā€“ growing at 20 TB/week ā€¢ Enter URL and pick a date June 26, 2018 13 "Itā€™s more like a phone book than like an archive." -Jill Lepore, The New Yorker
  • 14. @weiglemc, @WebSciDL What do people think the Wayback Machine is? June 26, 2018 14 https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
  • 15. @weiglemc, @WebSciDL What do people think the Wayback Machine is? June 26, 2018 15 https://www.cnn.com/2018/02/16/politics/richard-pinedo-guilty-plea/index.html https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213 https://web.archive.org/web/20180115103952/https:/auctionessistance.com/
  • 16. @weiglemc, @WebSciDL Caches are not archives June 26, 2018 16 http://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html http://www.wired.co.uk/article/russia-propaganda-online-blog-longform-medium-posts https://webcache.googleusercontent.com/search?q=cache:qwqnGPqC2vsJ:https://medium.com/ %40TheFoundingSon/huffington-post-vs-whiteness-and-white-women- 1e67193085d4+&cd=15&hl=en&ct=clnk&gl=uk
  • 17. @weiglemc, @WebSciDL Is it really that important to archive instead of just taking a screenshot? June 26, 2018 17 https://twitter.com/AngryBlackLady/status/990032514080108544 https://twitter.com/phonedude_mln/status/990070331737100288
  • 18. @weiglemc, @WebSciDL We should be doing both June 26, 2018 18 https://twitter.com/conspirator0/status/1000475042017366017
  • 19. @weiglemc, @WebSciDL ā€œIf you see something, save somethingā€ June 26, 2018 19 https://blog.archive.org/2017/01/25/see-something-save-something/
  • 20. @weiglemc, @WebSciDL There's more than just the Internet Archive June 26, 2018 20 http://timetravel.mementoweb.org/list/20020908180610/http://blog.reidreport.com/
  • 21. @weiglemc, @WebSciDL TimeTravel June 26, 2018 21 http://timetravel.mementoweb.org
  • 22. @weiglemc, @WebSciDL Pro tip: submit pages to multiple archives June 26, 2018 22 https://twitter.com/phonedude_mln/status/998948823845261312
  • 23. @weiglemc, @WebSciDL We've built tools to help people submit webpages to multiple archives ā€¢ Mink ā€“ Google Chrome extension ā€¢ #icanhazmemento ā€“ Twitter bot ā€¢ ArchiveNow ā€“ Python module, Docker container, local web service June 26, 2018 23
  • 24. @weiglemc, @WebSciDL Mink June 26, 2018 24 ā€œArchive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcherā€, 2014-2017, HK-50181-14 Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento," JCDL 2014, poster. http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html Google Chrome extension Submit currently viewed webpage to public archives https://github.com/machawk1/ Mink
  • 25. @weiglemc, @WebSciDL #icanhazmemento June 26, 2018 25 http://ws-dl.blogspot.com/2015/07/2015-07-22-i-can-haz-memento.html Twitter bot Include #icanhazmemento in a tweet with a URL Bot replies with a link to the memento of the page closest to the time of the tweet If page not archived, bot submits URL to multiple public archives, replies with a link to the memento in Time Travel Alexander Nwala, "2015-07-22: I Can Haz Memento," https://github.com/anwala/icanhazmemento
  • 26. @weiglemc, @WebSciDL ArchiveNow June 26, 2018 26 Mohamed Aturban, Mat Kelly, Sawood Alam, John Berlin, Michael L. Nelson and Michele C. Weigle, "ArchiveNow: Simplified, Extensible, Multi-Archive Preservation," JCDL 2018, poster. http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html Python module, Docker container Submit URI to multiple archives ā€œTowards a Web-Centric Approach for Capturing the Scholarly Recordā€, 2016-2019 https://github.com/oduwsdl/archivenow
  • 27. @weiglemc, @WebSciDL Memento: Time Travel for the Web Access mementos in multiple web archives Mementoā€™s core components: ā€¢ A bridge between present and past: link and content negotiation ā€¢ A bridge between past and present: link June 26, 2018 27
  • 30. @weiglemc, @WebSciDL How can I use Memento? June 26, 2018 Memento for Chrome http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html http://timetravel.mementoweb.org 30 Mink
  • 31. @weiglemc, @WebSciDL Use Mink to view the odu.edu of the past June 26, 2018 31
  • 32. @weiglemc, @WebSciDL Click the Mink icon June 26, 2018 32
  • 33. @weiglemc, @WebSciDL Then choose your datetime June 26, 2018 33
  • 35. @weiglemc, @WebSciDL Fixing 404 Pages: Google Results Page June 26, 2018 35
  • 36. @weiglemc, @WebSciDL Fixing 404 Pages: Result Page June 26, 2018 36 http://www.clashmusic.com/news/johnny-marr-leaves-the-cribs
  • 37. @weiglemc, @WebSciDL Fixing 404 Pages: Scrolling Down June 26, 2018 37
  • 38. @weiglemc, @WebSciDL Fixing 404 Pages: Server Up, Page 404 June 26, 2018 38
  • 39. @weiglemc, @WebSciDL Fixing 404 Pages: Using Mink June 26, 2018 39
  • 40. @weiglemc, @WebSciDL Fixing 404 Pages: Archived Page 2011- 04-16 June 26, 2018 40
  • 41. @weiglemc, @WebSciDL #whatdiditlooklike June 26, 2018 41 http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html Twitter bot Include #whatdiditlooklike in a tweet with a URL Bot generates animated GIF of first memento of each year Bot replies with a link to entry in Tumblr Tumblr: http://whatdiditlooklike.mementoweb.org/ Source: https://github.com/anwala/wdill Alexander Nwala, "2015-02-05: What Did It Look Like?,"
  • 42. @weiglemc, @WebSciDL Use web archives to save the current web and view the past web ā€¢ Web Science and Digital Libraries (WS-DL) group at ODU ā€“ ws-dl.blogspot.com, @WebSciDL (Twitter) ā€¢ Websites/Tools for web archiving ā€“ Internet Archive's Wayback Machine - archive.org/web ā€“ On-demand archiving - archive.is ā€“ Memento Time Travel - timetravel.mementoweb.org ā€“ Mink - matkelly.com/mink/ ā€“ #icanhazmemento ā€“ #whatdiditlooklike June 26, 2018 42