SlideShare uma empresa Scribd logo
1 de 27
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Evaluating
Service Optimizations
Martin Klein
Lyudmila Balakireva
Harihar Shankar
Research Library
Los Alamos National Laboratory
https://arxiv.org/abs/1906.00058
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Memento Time Travel
http://timetravel.mementoweb.org/
2
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Memento Time Travel
http://timetravel.mementoweb.org/list/20160214085934/http://jcdl.org
3
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Memento Time Travel
https://arquivo.pt/wayback/20160515103313/http://www.jcdl.org/
4
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
How does this work?
Memento Aggregator (simplistic view)
5
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Memento Aggregator
6
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Memento Aggregator
7
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
• As the number of archives grows, sending requests to each archive
for every incoming request is not feasible
• Response times
• Memento infrastructure load
• Load on distributed archives
LANL Memento Aggregator - Problem
8
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
• We could predict whether or not to issue a request to a specific
archive?
• By merely looking at the requested URI-R
• A binary classifier per archive
• We could train the classifiers using cached data?
• That would be pretty neat, indeed:
• Retrain classifiers as web archive collections evolve
• Not dependent on external data
• Querying classifiers probably way faster (msec) than polling
archives (sec)
What if…
9
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
10
We can! Published @ JCDL 2016
https://doi.org/10.1145/2910896.2910899
• ML models based on simple URI features
• Character count, n-grams, domain
• Common ML algorithms used per archive
• Logistic Regression, Multinomial Bayes, SVM
• Optimized for
• Prediction time, not training time
• Reduction of false positive rate
Results:
• Requests per URI-R: 3.96 vs 11
• Response time:
2.16s vs 3.08s
• Recall:
0.847
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
11
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
In Production…
12
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
13
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
14
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
15
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
16
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
17
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
18
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
• How effective is the cache?
• What is the hit/miss ratio? Does it vary for different Memento
services?
• Is the cache freshness period appropriate?
• How effective is the ML process?
• What is the recall and the false positive rate?
• Do we need to retrain the models? How often?
Questions to Ask
19
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
• Memento Aggregator currently covers
• 23 web archives
• 17 with native memento support
• 6 with by-proxy memento support
• Analysis of log files
• recorded between July 4th 2017 and October 17th 2018
• > 11m requests in total
• Approx. 2.6m requests against machine learning process
• Results in 2.6m lookups to populate cache
• Used as “truth” to assess ML prediction
Evaluation
20
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Cache Hit/Miss Rate
21
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Cache Hit/Miss Rate
22
humanshumans machines machinesMostly driven by
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Recall
23
0.847
0.727
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
False Positives
24
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Dynamic Web Archives
25
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
• Memento Aggregator cache is very effective
• ~ 60% of requests served from cache
• Human-driven services benefit the most
• Machine learning process saves!
• Requests & time while at acceptable recall level
• Recall: 0.727
• Re-training seems necessary, frequency TBD
Optimization
• ML model trained on archival holdings, not usage logs/cache
• Beneficial for new archives
• Neural network classifier, based on simple URI features, show
promising results
Takeaways
26
Evaluating Memento Service Optimizations
@mart1nkle1n
JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA
Evaluating
Service Optimizations
Martin Klein
Lyudmila Balakireva
Harihar Shankar
Research Library
Los Alamos National Laboratory
https://arxiv.org/abs/1906.00058

Mais conteúdo relacionado

Mais de Martin Klein

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_mementoMartin Klein
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Martin Klein
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print VersionsMartin Klein
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Martin Klein
 

Mais de Martin Klein (20)

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_memento
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print Versions
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016
 

Último

Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...SUHANI PANDEY
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceEscorts Call Girls
 
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...SUHANI PANDEY
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.soniya singh
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls DubaiEscorts Call Girls
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 

Último (20)

Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 

Evaluating Memento Service Optimizations

  • 1. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Evaluating Service Optimizations Martin Klein Lyudmila Balakireva Harihar Shankar Research Library Los Alamos National Laboratory https://arxiv.org/abs/1906.00058
  • 2. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Memento Time Travel http://timetravel.mementoweb.org/ 2
  • 3. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Memento Time Travel http://timetravel.mementoweb.org/list/20160214085934/http://jcdl.org 3
  • 4. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Memento Time Travel https://arquivo.pt/wayback/20160515103313/http://www.jcdl.org/ 4
  • 5. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA How does this work? Memento Aggregator (simplistic view) 5
  • 6. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Memento Aggregator 6
  • 7. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Memento Aggregator 7
  • 8. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA • As the number of archives grows, sending requests to each archive for every incoming request is not feasible • Response times • Memento infrastructure load • Load on distributed archives LANL Memento Aggregator - Problem 8
  • 9. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA • We could predict whether or not to issue a request to a specific archive? • By merely looking at the requested URI-R • A binary classifier per archive • We could train the classifiers using cached data? • That would be pretty neat, indeed: • Retrain classifiers as web archive collections evolve • Not dependent on external data • Querying classifiers probably way faster (msec) than polling archives (sec) What if… 9
  • 10. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 10 We can! Published @ JCDL 2016 https://doi.org/10.1145/2910896.2910899 • ML models based on simple URI features • Character count, n-grams, domain • Common ML algorithms used per archive • Logistic Regression, Multinomial Bayes, SVM • Optimized for • Prediction time, not training time • Reduction of false positive rate Results: • Requests per URI-R: 3.96 vs 11 • Response time: 2.16s vs 3.08s • Recall: 0.847
  • 11. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 11
  • 12. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA In Production… 12
  • 13. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 13
  • 14. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 14
  • 15. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 15
  • 16. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 16
  • 17. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 17
  • 18. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA 18
  • 19. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA • How effective is the cache? • What is the hit/miss ratio? Does it vary for different Memento services? • Is the cache freshness period appropriate? • How effective is the ML process? • What is the recall and the false positive rate? • Do we need to retrain the models? How often? Questions to Ask 19
  • 20. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA • Memento Aggregator currently covers • 23 web archives • 17 with native memento support • 6 with by-proxy memento support • Analysis of log files • recorded between July 4th 2017 and October 17th 2018 • > 11m requests in total • Approx. 2.6m requests against machine learning process • Results in 2.6m lookups to populate cache • Used as “truth” to assess ML prediction Evaluation 20
  • 21. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Cache Hit/Miss Rate 21
  • 22. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Cache Hit/Miss Rate 22 humanshumans machines machinesMostly driven by
  • 23. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Recall 23 0.847 0.727
  • 24. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA False Positives 24
  • 25. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Dynamic Web Archives 25
  • 26. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA • Memento Aggregator cache is very effective • ~ 60% of requests served from cache • Human-driven services benefit the most • Machine learning process saves! • Requests & time while at acceptable recall level • Recall: 0.727 • Re-training seems necessary, frequency TBD Optimization • ML model trained on archival holdings, not usage logs/cache • Beneficial for new archives • Neural network classifier, based on simple URI features, show promising results Takeaways 26
  • 27. Evaluating Memento Service Optimizations @mart1nkle1n JCDL 2019, 06/04/2019, Urbana-Champaign, IL, USA Evaluating Service Optimizations Martin Klein Lyudmila Balakireva Harihar Shankar Research Library Los Alamos National Laboratory https://arxiv.org/abs/1906.00058