SlideShare uma empresa Scribd logo
1 de 59
BROWSING AND
RECOMPOSITION POLICIES
TO MINIMIZE TEMPORAL
ERROR WHEN UTILIZING
WEB ARCHIVES
SCOTT G. AINSWORTH
OLD DOMINION UNIVERSITY
COMPUTER SCIENCE
JCDL 2013
JULY 23-25, 2013
INDIANAPOLIS, INDIANA USA
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
2
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
3
A long, long time ago…
ODU Computer Science
updated its web site…
What did it look like?
May 2005...
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
4
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
5
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
6
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
7
JointConferenceonDigitalLibraries(JCDL)2013
WHAT JUST HAPPENED?
WHAT WE EXPECTED
2005-05-14 @ 01:36:08
WHAT WE GOT
2005-03-31 @ 09:16:10
7/23/13 Scott G. Ainsworth • Michael L. Nelson
8
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL SPREAD
7/23/13 Scott G. Ainsworth • Michael L. Nelson
9
2005-05-
14
01:36:08
+9 days
+18 days +18 days
+7 months
+2.1 years
JointConferenceonDigitalLibraries(JCDL)2013
QUESTIONS
• How much temporal drift do users experience?
• How much temporal spread exists in composite
mementos?
• How can drift and spread be minimized?
• What factors contribute, positively or
negatively, to drift and spread?
• Does combining multiple archives produce
better results?
• Would users with differing goals benefit from
different minimization policies and heuristics?
• How can temporal coherence be displayed to
users—simply?
7/23/13 Scott G. Ainsworth • Michael L. Nelson
10
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
11
JointConferenceonDigitalLibraries(JCDL)2013
RELATED WORK
Web Crawling for Search Engines
• Douglis – Change rates
• Cho – Optimal crawling strategies, change rates,
Web evolution
Web Archiving
• Masanés – Web Archiving: Issues and Methods
• Jaffe & Kirkpatrick – Internet Archive architecture
• Moore et al. – Heritrix crawler
7/23/13 Scott G. Ainsworth • Michael L. Nelson
12
JointConferenceonDigitalLibraries(JCDL)2013
RELATED WORK
Control Crawl Data Quality, Future collections
• Spaniol et al. – crawling strategy
• Denev et al. – change rates by MIME type and
depth
• Ben Saad et al. – metadata from crawl used to
select best results from archive
Our Focus: Existing Data Quality
• Existing collections
• Datetime selection policies
7/23/13 Scott G. Ainsworth • Michael L. Nelson
13
JointConferenceonDigitalLibraries(JCDL)2013
RELATED WORK
Use Patterns
• AlNoamony et al. – Archive Access Patterns
• Humans vs. Robots
• Dip, dive, slide, & skim
Identifying Duplicates
• Simple identity – images, other binary formats
• direct comparison
• Hash comparison
• HTML, CSS (text)
• Shingling, Jaccard distances, etc.
• SimHash ⃪ most promise
7/23/13 Scott G. Ainsworth • Michael L. Nelson
14
JointConferenceonDigitalLibraries(JCDL)2013
RELATED WORK – MEMENTO*
• HTTP extension for datetime negotiation
Request
Response
7/23/13 Scott G. Ainsworth • Michael L. Nelson
15
GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
…
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
…
HTTP/1.1 200 OK
…
Memento-Datetime: Sat, 14 May 2005 01:36:08 GMT
…
*https://datatracker.ietf.org/doc/draft-vandesompel-memento/
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 How much of the Web is archived
 Temporal Drift
 Temporal Spread
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
16
JointConferenceonDigitalLibraries(JCDL)2013
HOW MUCH IS ARCHIVED?
7/23/13 Scott G. Ainsworth • Michael L. Nelson
17
35 – 90% At least one archived copy
17 – 49% 2 – 5 copies
1 – 8% 6 – 10 copies
8 – 63% > 10 copies JCDL’11
Internet Archive
Search Engine
Other
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 How much of the Web is archived
 Temporal Drift
 Temporal Spread
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
18
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL DRIFT
Comparing two policies
• Sliding – target datetime changes
• Sticky – target datetime held steady
7/23/13 Scott G. Ainsworth • Michael L. Nelson
19
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
20
2005-05-14
01:36:08
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
21
2005-04-22
00:17:52
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
22
2005-03-31
09:16:10
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
What if the target
is held steady?
(Enabled by Memento API)
7/23/13 Scott G. Ainsworth • Michael L. Nelson
23
JointConferenceonDigitalLibraries(JCDL)2013
2005-05-14STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
24
MementoFoxExtension
2005-05-14
01:36:08
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
25
2005-04-22
00:17:52
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
26
2005-05-
14
01:36:08
JointConferenceonDigitalLibraries(JCDL)2013
MEDIAN DRIFT BY STEP
Median Drift by Step
Step Number
MedianDrift(Months)
1 10 20 30 40 50
01m2m3m
API
UI
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●
●●●●
●
●
●●●●●●●●
●
●
●
●
●
●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●●●●
●
●
●
●
●●●●
●
●
● Sliding
● Sticky
MedianDrift(months)
7/23/13 Scott G. Ainsworth • Michael L. Nelson
27
Step Number
JCDL’13
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 How much of the Web is archived
 Temporal Drift
 Temporal Spread
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
28
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL SPREAD
7/23/13 Scott G. Ainsworth • Michael L. Nelson
29
JointConferenceonDigitalLibraries(JCDL)2013
COMPOSITE MEMENTO
PRESENTATION STRUCTURE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
30
URI-M0
URI-M1 URI-M2 URI-Mi-1
...
URI-Mi URI-Mi+1 URI-Mn...
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL SPREAD
7/23/13 Scott G. Ainsworth • Michael L. Nelson
31
2005-05-
14
01:36:08
+9 days
+18 days +18 days
+7 months
+2.1 years
JointConferenceonDigitalLibraries(JCDL)2013
EMBEDDED RESOURCES
Resource Memento-Datetime Delta Resource
Memento-
Datetime
Delta
http://www.cs.odu.edu 2005-05-14 01:36:08 spacer.gif 2005-06-01 16:23:10 18.6 d
mm_menu.js 2005-05-23 02:39:12 9.0 d jimcheng.gif 2005-06-01 16:37:39 18.6 d
style.css 2005-05-23 02:39:39 9.0 d jsmith.gif 2005-06-01 16:58:50 18.6 d
gfx-logo-odu-crown.gif 2005-05-23 02:39:39 9.0 d rmenu_1st_featured_alumni.png 2005-06-01 21:21:45 18.8 d
ddmenu_ddown.js 2005-05-23 02:39:43 9.0 d hmenu_college_...-new.png 2005-12-21 20:14:25 7.3 mo
university.js 2005-05-23 02:39:56 9.0 d rmenu_1st_upcoming_news.png 2005-12-21 20:15:14 7.3 mo
rmenu_1st_about.png 2005-06-01 13:40:25 18.5 d rmenu_1st_upcoming_events.png 2005-12-21 21:01:12 7.3 mo
rmenu_bottom_229.gif 2005-06-01 14:07:29 18.5 d lmenu_1st_resources.png 2005-12-28 17:47:41 7.5 mo
shadow-bl.gif 2005-06-01 14:55:53 18.6 d bullet_blue_triangle.gif 2005-12-28 19:43:48 7.5 mo
ecsbdg.jpg 2005-06-01 14:56:17 18.6 d logo-cs.gif 2005-12-28 19:54:29 7.5 mo
shadow-br.gif 2005-06-01 15:18:18 18.6 d rmenu_1st_featured_student.png 2007-06-12 02:36:07 2.1 years
gfx-btn-go-dblue.gif 2005-06-01 15:34:19 18.6 d shadow-b.gif 2007-06-21 02:35:17 2.1 years
shadow-tr.gif 2005-06-01 15:55:57 18.6 d shadow-r.gif 404 Not Found
header-right1.gif 2005-06-01 16:06:16 18.6 d
7/23/13 Scott G. Ainsworth • Michael L. Nelson
32
Embedded Resources 26
Mean Delta 125.9 days
Standard Deviation 207.7 days
Spread 2.1 years
JointConferenceonDigitalLibraries(JCDL)2013
REPRESENTING SPREAD
COMPOSITE MEMENTO
TEMPORAL SPREAD CHART
7/23/13 Scott G. Ainsworth • Michael L. Nelson
33
URI-M0
URI-M1 URI-M2 URI-M3
Root
Embedded
Same Domain
Reused
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL SPREAD – ODU CS
7/23/13 Scott G. Ainsworth • Michael L. Nelson
34
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
35
root emb1
1 Memento, Bracketed Root
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
36
1 Memento, Bracketed Root
Last-
Modified root emb1
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
37
1 Memento, Bracketed Root
Last-
Modified
Last-Modified ≤ root ≤ emb1 Þ coherent
root emb1
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
38
1 Memento, Root Not Bracketed
Last-
Modifiedroot emb1
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
39
1 Memento, Root Not Bracketed
Last-
Modified
root ≤ Last-Modified ≤ emb1 Þ violation
root emb1
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
40
1 Memento, No Last-Modified
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
41
n/a
Last-Modified ≤ emb < root Þ possibly coherent
rootembn
1 Memento, Before Root
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
42
2 Mementos, Root Not Bracketed
Last-
Modifiedroot emb1
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
43
2 Mementos, Root Not Bracketed
Last-
Modified
n/a
root embi+1
embi
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
44
2 Mementos, Use Content – Similarity
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
45
2 Mementos, Contents Equal or Equivalent
JointConferenceonDigitalLibraries(JCDL)2013
TEMPORAL COHERENCE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
46
2 Mementos, Contents Not Equal or Equivalent
JointConferenceonDigitalLibraries(JCDL)2013
FIRST EXPERIMENT
• 1,000 URIs from DMOZ (Open Directory)
• Download all timemaps
• Download all composite mementos
• Download all embedded resources
• Single and Multiple Archives
• Four Heuristics
7/23/13 Scott G. Ainsworth • Michael L. Nelson
47
JointConferenceonDigitalLibraries(JCDL)2013
PRELIMINARY RESULTS 1
Count Description Percent
1,000 Root URI-Rs
910 Root timemaps 91%
87,847 Root URI-Ms in timemaps
96.5 URI-Ms per Root URI-R
85,570 Root memento downloaded 97%
1,488,420 Embedded URI-Rs
17.4 Embedded URI-Rs per Root memento
7/23/13 Scott G. Ainsworth • Michael L. Nelson
48
JointConferenceonDigitalLibraries(JCDL)2013
PRELIMINARY RESULTS 2
Description Minimize
Distance,
Single
Archive
Minimize
Distance,
Multi-
Archive
3-Month
Window,
Multi-
Archive
Embedded URI-Rs 1,488,440 1,488,420 1,447,351
Embedded URI-Ms in timemaps 1,169,787 1,186,456 500,541
URI-M/Embedded URI-R 0.79 0.80 0.35
% Complete 73.8% 75.4% 33.8%
Mean spread 200.2 200.1 15.1
Standard Deviation 219.2 219.9 14.3
7/23/13 Scott G. Ainsworth • Michael L. Nelson
49
JointConferenceonDigitalLibraries(JCDL)2013
CURRENT EXPERIMENT
• 4,000 URIs from JCDL’11 “How Much…” paper
• 1 URI/month vice all
• Target WSDM 2013
7/23/13 Scott G. Ainsworth • Michael L. Nelson
50
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
51
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Browsing Patterns, Clusters & Drift
• AlNoamany et al. – Real-world access patterns
• Domains users avoid – link farms, etc.
• Domain clusters
7/23/13 Scott G. Ainsworth • Michael L. Nelson
52
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Timemaps, Redirection, Missing Mementos
• Timemaps only tell part of the story
• URI-R redirection (302 from source)
• URI-M redirection (Archive action)
• Mementos in timemaps but not accessible
• Policies must consider user needs
• Leave it missing
• Show “best” substitute
7/23/13 Scott G. Ainsworth • Michael L. Nelson
53
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Similarity & Duplication
• Delta are currently | root – embedded |
• If bracketing mementos are identical,
should delta be zero?
• HTML is usually modified by the archive
• Can’t check for equality
• Shingling? SimHash?
7/23/13 Scott G. Ainsworth • Michael L. Nelson
54
0 +30d–30d
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Communicating Status
7/23/13 Scott G. Ainsworth • Michael L. Nelson
55
Coherent Partial Incoherent Missing
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Policies & Heuristics
• Drift
• Sliding target
• Sticky target
• Spread
• Minimize distance
• Past only
• Past preferred
• Near or within distance
• Single vs. multi-archive
• Refine to meet user expectations
7/23/13 Scott G. Ainsworth • Michael L. Nelson
56
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Preliminary work
 Future work
 Conclusion
7/23/13 Scott G. Ainsworth • Michael L. Nelson
57
JointConferenceonDigitalLibraries(JCDL)2013
CONCLUSION
Extensive research on improving acquisition exists
Best use of existing collections needs study
We are looking at
• Characterizing existing holdings
• Policies that minimize impact of drift and spread
• Characterizing memento and walk status
7/23/13 Scott G. Ainsworth • Michael L. Nelson
58
JointConferenceonDigitalLibraries(JCDL)2013
TIMELINE
Spread
Policy
Drift
CIKM
May '13
paper
Nov '13
paper
Feb '14
Missing mementos,
duplicates,
similarity, icon
Ph.D.
dissertation
Candidacy
Aug '13
Defense
Nov '14
paper
Spring '14
"Human" patterns,
clusters, spam sites
7/23/13 Scott G. Ainsworth • Michael L. Nelson
59

Mais conteúdo relacionado

Destaque

Making the Promise of College Real by Martha Kanter - Community Convention 2016
 Making the Promise of College Real by Martha Kanter - Community Convention 2016 Making the Promise of College Real by Martha Kanter - Community Convention 2016
Making the Promise of College Real by Martha Kanter - Community Convention 2016America's Promise Alliance
 
Folha Dominical - 06.02.11 Nº359
Folha Dominical - 06.02.11 Nº359Folha Dominical - 06.02.11 Nº359
Folha Dominical - 06.02.11 Nº359Comunidades Vivas
 
Educación plástica y visual criterios de evaluación
Educación plástica y visual  criterios de evaluaciónEducación plástica y visual  criterios de evaluación
Educación plástica y visual criterios de evaluaciónS_Torres
 
Folha Dominical - 20.11.11 Nº 399
Folha Dominical - 20.11.11 Nº 399Folha Dominical - 20.11.11 Nº 399
Folha Dominical - 20.11.11 Nº 399Comunidades Vivas
 
Certificate of appreciation
Certificate of appreciationCertificate of appreciation
Certificate of appreciationAgron Elezi
 
1-Page Marketing Plan 10-14 - 1
1-Page Marketing Plan 10-14 - 11-Page Marketing Plan 10-14 - 1
1-Page Marketing Plan 10-14 - 1Scott Michelson
 
Thank You letter from Client
Thank You letter from ClientThank You letter from Client
Thank You letter from ClientJonathan Frock
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 

Destaque (14)

IMG_0004
IMG_0004IMG_0004
IMG_0004
 
Making the Promise of College Real by Martha Kanter - Community Convention 2016
 Making the Promise of College Real by Martha Kanter - Community Convention 2016 Making the Promise of College Real by Martha Kanter - Community Convention 2016
Making the Promise of College Real by Martha Kanter - Community Convention 2016
 
Folha Dominical - 06.02.11 Nº359
Folha Dominical - 06.02.11 Nº359Folha Dominical - 06.02.11 Nº359
Folha Dominical - 06.02.11 Nº359
 
Emilioblogathen
EmilioblogathenEmilioblogathen
Emilioblogathen
 
Mapita2
Mapita2Mapita2
Mapita2
 
Educación plástica y visual criterios de evaluación
Educación plástica y visual  criterios de evaluaciónEducación plástica y visual  criterios de evaluación
Educación plástica y visual criterios de evaluación
 
Folha Dominical - 20.11.11 Nº 399
Folha Dominical - 20.11.11 Nº 399Folha Dominical - 20.11.11 Nº 399
Folha Dominical - 20.11.11 Nº 399
 
Acta xii assembleapodemcalella
Acta xii assembleapodemcalellaActa xii assembleapodemcalella
Acta xii assembleapodemcalella
 
Certificate of appreciation
Certificate of appreciationCertificate of appreciation
Certificate of appreciation
 
AniversáRios
AniversáRiosAniversáRios
AniversáRios
 
1-Page Marketing Plan 10-14 - 1
1-Page Marketing Plan 10-14 - 11-Page Marketing Plan 10-14 - 1
1-Page Marketing Plan 10-14 - 1
 
Thank You letter from Client
Thank You letter from ClientThank You letter from Client
Thank You letter from Client
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 

Semelhante a Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing Web Archives

Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Mat Kelly
 
FOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdfFOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdfFOGSSCommittee
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?Seval Çapraz
 
Pinsage Stanford slides.pdf
Pinsage Stanford slides.pdfPinsage Stanford slides.pdf
Pinsage Stanford slides.pdfssuser3a8f33
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015Victoria Steeves
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...Sven Van Laere
 
Boston DataSwap 2013 -- Network Visualization in NodeXL
Boston DataSwap 2013 -- Network Visualization in NodeXLBoston DataSwap 2013 -- Network Visualization in NodeXL
Boston DataSwap 2013 -- Network Visualization in NodeXLcodydunne
 
New Trends and Directions in Data Science - MIT Information Quality Conferenc...
New Trends and Directions in Data Science - MIT Information Quality Conferenc...New Trends and Directions in Data Science - MIT Information Quality Conferenc...
New Trends and Directions in Data Science - MIT Information Quality Conferenc...Mario Faria
 
FITC - Data Visualization in Practice
FITC - Data Visualization in PracticeFITC - Data Visualization in Practice
FITC - Data Visualization in PracticeRami Sayar
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 IntroductionChengjen Lee
 
State Survey Experience with the National Geothermal Database system
State Survey Experience with the National Geothermal Database systemState Survey Experience with the National Geothermal Database system
State Survey Experience with the National Geothermal Database systemDenise Hills
 
CyberGIS Architectures for Collaborative Problem Solving - OGC perspective
CyberGIS Architectures for Collaborative Problem Solving - OGC perspectiveCyberGIS Architectures for Collaborative Problem Solving - OGC perspective
CyberGIS Architectures for Collaborative Problem Solving - OGC perspectiveGeorge Percivall
 
Leslie townsend main stage - 2013
Leslie townsend   main stage - 2013Leslie townsend   main stage - 2013
Leslie townsend main stage - 2013Ray Poynter
 
Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...KU Leuven
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...Creating a Better Testing Future: The World Is Changing and We Must Change Wi...
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...TechWell
 
Forum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonizationForum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonizationNathan Baker
 
Quantifying the future
Quantifying the futureQuantifying the future
Quantifying the futureGio Wiederhold
 

Semelhante a Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing Web Archives (20)

Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013
 
FOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdfFOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdf
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?What is Datamining? Which algorithms can be used for Datamining?
What is Datamining? Which algorithms can be used for Datamining?
 
Pinsage Stanford slides.pdf
Pinsage Stanford slides.pdfPinsage Stanford slides.pdf
Pinsage Stanford slides.pdf
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
 
Boston DataSwap 2013 -- Network Visualization in NodeXL
Boston DataSwap 2013 -- Network Visualization in NodeXLBoston DataSwap 2013 -- Network Visualization in NodeXL
Boston DataSwap 2013 -- Network Visualization in NodeXL
 
New Trends and Directions in Data Science - MIT Information Quality Conferenc...
New Trends and Directions in Data Science - MIT Information Quality Conferenc...New Trends and Directions in Data Science - MIT Information Quality Conferenc...
New Trends and Directions in Data Science - MIT Information Quality Conferenc...
 
FITC - Data Visualization in Practice
FITC - Data Visualization in PracticeFITC - Data Visualization in Practice
FITC - Data Visualization in Practice
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
 
State Survey Experience with the National Geothermal Database system
State Survey Experience with the National Geothermal Database systemState Survey Experience with the National Geothermal Database system
State Survey Experience with the National Geothermal Database system
 
CyberGIS Architectures for Collaborative Problem Solving - OGC perspective
CyberGIS Architectures for Collaborative Problem Solving - OGC perspectiveCyberGIS Architectures for Collaborative Problem Solving - OGC perspective
CyberGIS Architectures for Collaborative Problem Solving - OGC perspective
 
Leslie townsend main stage - 2013
Leslie townsend   main stage - 2013Leslie townsend   main stage - 2013
Leslie townsend main stage - 2013
 
Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...Creating a Better Testing Future: The World Is Changing and We Must Change Wi...
Creating a Better Testing Future: The World Is Changing and We Must Change Wi...
 
Forum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonizationForum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonization
 
Quantifying thefuture
Quantifying thefutureQuantifying thefuture
Quantifying thefuture
 
Quantifying the future
Quantifying the futureQuantifying the future
Quantifying the future
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing Web Archives

Notas do Editor

  1. Please forgive the long title. Let me explain it with a fable…
  2. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  3. A student at ODU becomes curious about the history of the Computer Science Department and visits the Internet Archive’s Wayback Machine.
  4. The student enters http://www.cs.odu.edu and is shown the available dates.The student navigates to2005 and selects 14 May @ 01:36:08.
  5. The student review the Computer Science page.Finding the College of Scienceslink interesting link, the student clicks on it.
  6. After reviewing the College of Sciences page, the student returns to the Computer Science page, and…
  7. 1. Whoa! That’s not what was expected!
  8. What just happened.We expected the left side, but got the right side.This is a result of the applying the Sliding Target Policy.Highlight the temporal drift.
  9. Let return to temporal spread.Even though the display is May 14, 2005(CLICK)The resources are captured at very different times.(CLICK)Some days(CLICK)Some months(CLICK)Even years (in this case a m image in the footer)
  10. This leads to questions:
  11. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  12. The majority of work to date has focused on improving the quality of data acquisition.Spaniol et al. focused on strategy.Denev et a. looked at change rate by MIME type.Ben Saad et al. crawl metadata used to improve presentation to the user.Our focus is getting the best results from existing collectionsAfter all, we can’t go back and “fix” past data acquisition.
  13. The majority of work to date has focused on improving the quality of data acquisition.Spaniol et al. focused on strategy.Denev et a. looked at change rate by MIME type.Ben Saad et al. crawl metadata used to improve presentation to the user.Our focus is getting the best results from existing collectionsAfter all, we can’t go back and “fix” past data acquisition.
  14. The majority of work to date has focused on improving the quality of data acquisition.Spaniol et al. focused on strategy.Denev et a. looked at change rate by MIME type.Ben Saad et al. crawl metadata used to improve presentation to the user.Our focus is getting the best results from existing collectionsAfter all, we can’t go back and “fix” past data acquisition.
  15. Memento is an HTTP extension for datetime negotiation.Now implemented by the Internet Archive, Archive.is, UK National Archive, and UK Web ArchiveThis is a very abbreviated introduction to the Memento API.The Memento API allows an HTTP client to negotiate a datetime.On request, the client add the Accept-Datetime header.On reply, the server sends the Memento-Datetime header, indicating the actual datetime of the memento returned.Memento-Datetime is generally the acquisition datetime of the archived copy.
  16. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  17. At JCDL 2011, we published “How Much of the Web Is Archived?”This density chart gives a sense of Web archival patterns.Each row represents a single URI. So, row 200 represents the 200th URI.The rows are ordered such that the URI with the earliest memento is on the bottom.The empty rows at the top are URIs that are not archived.Each dot represents a single memento.Most mementos, the brown dots, come from the Internet Archive.The Blue dot are search engine caches—note that since this study was completed, the search engine caches have all locked down—effectively, they are no longer viable sources.The red dot represent other archivesx (WebCite, etc.)
  18. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  19. We have investigated the temporal drift which occurs while browsing archives.(CLICK)Let use pick up from the introduction
  20. This is an example of the “Sliding Target Policy.”Here is how it works:We started on the May 14 page we selected.When The College of Sciences was clicked,May 14 was used as the target.
  21. And, April 22 was nearest Memento (archived version).When The Computer Science was clicked,April 22 was used as the target.
  22. And, March 31 was nearest Memento.
  23. “What if the target datetime is held steady instead of being allowed to drift?”The Memento extension to HTTP enables this.
  24. Sticky target can be accomplished using the MementoFox extension to Firefox.MementoFox allows the datetime desired is entered and remain fixed.(CLICK)The nearest Memento is retrieved.(CLICK)In this case, the May 14 Computer Science page—same as we selected using the Wayback Machine UI.When the College of Sciences is clicked…(CLICK)
  25. The April 22 page is shown again, because the target datetime is still 2005-05-14.So it is still the nearest.(CLICK)When Computer Science is clicked again…
  26. May 15 is shown as expected.(PAUSE)
  27. The data is variable enough that median is the best measure of central tendency.The main point of this graph is that the Sticky policy reigns in drift andThe sliding policy allows it to continue to increase.Notes:The initial up curve is due to choosing a known Memento-Datetime.We suspect the drop starting at steps 42+ is due to large, self-referencing sites (101celebrities.com) and clusters of related sites.
  28. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  29. Let return to temporal spread.Most web pages are composed from multiple resources, some of which are circled here.(WAIT FOR ANIMATION)
  30. We call the collection of all mementos required to display a web page, a composite memento.A composite memento consists of a root and embedded mementos and can be represented as a tree. (It is actually a graph, but can be represented as a tree without loss of generality.)(CLICK)Which is represented as URI-M0 at the top of the tree on the right.Embedded mementos, such as images, are also represented in the tree.Embedded mementos can themselves have embedded mementos, for example HTML in a frame. (The ODU CS home page had frames in its 1990s versions, but no longer does.)
  31. Let return to temporal spread.Even though the display is May 14, 2005(CLICK)The resources are captured at very different times.(CLICK)Some days(CLICK)Some months(CLICK)Even years (in this case a m image in the footer)
  32. This is a list of all the mementos that comprise http://www.cs.odu.edu.It is a bit of an eye chart, so here is a summary(CLICK)There are 26 embedded mements (27 total including the root)The mean delta (distance from root) is 125.9 days.The standard deviation is 207.7 – which does not bode well for the mean.Here’s the kicker – the spread is 2.1 years!
  33. Assume we have a composite resource with two embedded images.The graph on the right represents two composite mementos for this resource.The red diamonds are the root mementos, captured at different datetimes.Roots are centered at 0 delta; embedded mementos are offset by their delta.The blue and orange diamonds represent the embedded mementos.Orange mementos are from the same domain as the root.Blue mementos are from a different domain.Gray diamonds represent reused mementos.
  34. Now lets have a look at the full chart (as of mid-2012) for cs.odu.edu.(CLICK)Here is the 2005-05-14 page we have been looking at.(CLICK)Here is page from 2011, (CLICK) and one from 2011.Several things stand out:The maximum spread is nearly 7 years (2005 row)Many embedded resources were acquired well after the corresponding root memento.Reuse appears very high.
  35. Consider 2 mementos, 1 root and 1 embedded.(EXPLAIN why there is only one)In this case the embedded memento was captured after the root(POINT OUT WHICH IS WHICH)Is this coherent? -- Hard to tell
  36. But add the Last-Modified date and it become more clear.In this case, the embedded memento’s Last-Modified and Memento-Datetime “bracket” the root,Providing evidence that the embedded memento existed when the root was captured.
  37. So we consider it coherent.
  38. But what happens when the root is not bracketed?In this case, there is evidence that the embedded memento did NOT exist when the root was captured.
  39. But what happens when the root is not bracketed?In this case, there is evidence that the embedded memento did NOT exist when the root was captured.We consider this a temporal coherence violation.
  40. Similarly, if Last-Modified is missing, it cannot be temporally coherentBut should it be a violation?It could actually be coherent.We are still gathering data on this one.
  41. Similarly, if the embedded memento was captured before the root,Was it still in existence when the root was captured?ProbablyBut more study required.
  42. Recall the single memento, root not bracketed pattern.
  43. What happens is a second memento for the embedded resource is available?We can’t prove either existed when the root was captured.It opens another possibility…
  44. Comparing the mementos.Here we introduce similarity measures.For images: direct comparison is appropriate-archive leave these alone.For text, HTML in particular, archives annotate—add comments—with metadata.In this case we must use a similarity measure such as shingling or SimHash.
  45. What happens is a second memento for the embedded resource is available?It opens another possibility…Comparing the mementos.
  46. If the contents are equal,There is evidence that the embedded memento existed when the root was captured.
  47. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  48. Real-world access patterns to bring results more inline with actual user experience.We see real humans go 50 steps?Why: Is there no need? Is the interface a problem? Does it get too weird?Try to avoid sites humans would avoid (very subjective—I avoid 101celebrities.com—you might like it)We suspect both drift and spread are influence by not just single large domains, but also by clusters of related domains. Amazon.com &amp; amazon-images.com for instance. Sussing out related domains will help clarify results.
  49. Timemaps only tell part of the storyMemento-Datetimes in timemaps frequently redirect to a different datetime or URIThis is reflected in the drift research but not the spread researchThis redirection will change the deltasBesides, what does it mean when we are redirected to another datetime? (Suspect archive has recognized a duplicate)Another common occurrence is missing mementos. They are in the timemap but not available in the archive.Our research to date simply lists these as missing.But as policies and heuristics are developed, user priorities might required several responses (leave it missing, substitute the next nearest, etc.)
  50. Delta is the absolute value of the difference between the room and embedded Memento-Datetimes.But there are other conditions that could or should indicate a delta of 0 instead.These all revolve around determining that no change has occurred.One of these is bracketing mementos.Explain the chart…However, HTML is problematic because comments are added by the archives.So, we cant check for equality.What similarity measure or measures are reasonable substitutes for equality.
  51. Succinctly communicating the status of a composite memento or walk to the user is important.(CLICK)This just isn’t very user friendly(CLICK)We need a single mutable icon or symbol that can be easily explained and understood.We may need several, one for casual users and one for researchers.(CLICK)For multiple-archive composites, we also need to acknowledge their contribution.
  52. Finally, policies and heuristics must be developed.For example, in the drift work we used sliding and sticky
  53. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.