SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Profiling Web Archives
Sawood Alam and Michael L. Nelson
Computer Science Department, Old Dominion University
Norfolk, Virginia - 23529
Herbert Van de Sompel, Lyudmila L. Balakireva, and Harihar Shankar
Los Alamos National Laboratory, Los Alamos, NM
David S. H. Rosenthal
Stanford University Libraries, Stanford, CA
Supported in part by the International Internet Preservation Consortium (IIPC)
Memento Aggregator
Memento Aggregator
Memento Aggregator
Memento Aggregator
Memento Aggregator
Memento Aggregator
Long Tail of Archives
Long Tail of Archives
● 400B+ web pages at IA do not cover
everything
● Top three archives after IA produce full
TimeMap 52% of the time (AlSum et al, TPDL 2013)
● Targeted crawls
● Special focus archives
● Restricted resources
● Private archives
Archive Profile
● High-level summary of an archive
● Predicts presence of mementos of a URI-R
in an archive
● Provides various statistics about the
holdings
● Small in size
● Publicly available
● Easy to update and partially patch
● Useful for Memento query routing and other
things
Available Profiling Resources
● Client request
● Archive response
● Archive index (CDX files)
A Client Request
An Archive Response
A CDX Snippet
Profiling Strategies
● Complete URI-R Profiling (1 URI-R = 1 Profile Key)
○ bbc.co.uk/images/logo.png?w=90
○ cnn.com/2014/03/15/?id=128734
● TLD-only Profiling (1 TLD = 1 Profile Key)
○ com)/
○ uk)/
● Middle Ground
○ uk,co)/
○ uk,co,bbc)/images
○ uk,co,bbc)/0/2/1
○ com,cnn)/ 201309 ar
Frequency Measurements
CDXJ Serialization
URI-Key Generation
Profile Merging
Base profile
New profile
Merged profile
Dataset
● Three archives
● Four sample query sets
● 23 profiles for each archive and sample set
Archives
Archive URI-Rs URI-Ms Size
Archive-It 1.9B 5.3B 1.8TB
UKWA 0.7B 1.7B 0.5TB
Stanford 12M 25M 8.3GB
Sample Query Sets
Sample In Archive-It In UKWA In Stanford
DMOZ 4.097% 1.912% 0.034%
MementoProxy 4.182% 0.179% 0.046%
IAWayback 3.716% 0.231% 0.039%
UKWayback 0.108% 0.034% 0.002%
Sample Size: 1M URIs Each
Evaluation
● Relate CDX Size, URI-M, URI-R, and URI-
Key
● Analyze profile growth
● Estimate Relative Cost
● Evaluate Routing Precision vs. Relative Cost
CDX Size vs URI-M (UKWA 10 Years)
Alpha: 175 bytes per CDX line
URI-M vs URI-R (UKWA 10 Years)
Gamma: 2.46
K : 2.686
Beta: 0.911
Space Cost (UKWA 7 Years)
Phi: 8.5e-07 -- 0.70583
Time Cost (UKWA 7 Years)
Tau: 5.7e-05 -- 6.2e-05
CDX: 45GB
URI-Ms: 181M
URI-Rs: 96M
Time: 3 hours
Resource Requirement
Archive-It
UKWA
Stanford
Cost vs Precision
Group Cost Precision
G1 (H1P0/TLD) Bound by # of TLDs < 0.05
G2 (H3P0, DDom,
DSub, DPth, DQry)
< 0.01 ≈ 2 * G1
G3 (DIni) ≈ 2 * G2 ≈ (3--4) * G1
G4 (HxP1) ≈ 5 * G3 ≈ (5--7) * G1
G5 (Higher HmPn) 0.4 -- 0.7 Not Explored
G6 (URIR) 1.0 1.0
Future Work
● Generating sample URI sets
● Profiling via sampling
● Language profiles
● Evaluation of combination profiles such as
URI-Key along with Datetime
● Profiles for usage other than Memento
routing, such as, site classification based
profiles (e.g., news, wiki, social media, blog
etc.)
Conclusions
● Generated profiles with different policies for
two archives
● Examined cost-precision tradeoffs of various
policies
● Related CDX Size, URI-M, URI-R, and URI-
Key
● Gained up to 22% routing precision with
<5% relative cost without any false negatives
● Code @ GitHub:/oduwsdl/archive_profiler

Mais conteúdo relacionado

Mais procurados

RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...webhostingguy
 
4th edition china natural gas map
4th edition china natural gas map4th edition china natural gas map
4th edition china natural gas maparapublication
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
TCP connection management in SDN
TCP connection management in SDNTCP connection management in SDN
TCP connection management in SDNChao Chen
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...Oscar Corcho
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulationINRIA-OAK
 

Mais procurados (12)

Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...Report on the Crawl and Harvest of the Whole Australian Web ...
Report on the Crawl and Harvest of the Whole Australian Web ...
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
4th edition china natural gas map
4th edition china natural gas map4th edition china natural gas map
4th edition china natural gas map
 
Discovering python search engine
Discovering python search engineDiscovering python search engine
Discovering python search engine
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
TCP connection management in SDN
TCP connection management in SDNTCP connection management in SDN
TCP connection management in SDN
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 
Danis xlink2011
Danis xlink2011Danis xlink2011
Danis xlink2011
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 

Destaque

Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.ismaturban
 
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesImpact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesMyriam Traub
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media StoriesYasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Sawood Alam
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionSawood Alam
 
JCDL2015: How Well are Arabic Websites Archived?
JCDL2015: How Well are Arabic Websites Archived?JCDL2015: How Well are Arabic Websites Archived?
JCDL2015: How Well are Arabic Websites Archived?LulwahMA
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Sawood Alam
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashedmwe400
 
Social Feed Manager presentation at Archives Unleashed 3.0
Social Feed Manager presentation at Archives Unleashed 3.0Social Feed Manager presentation at Archives Unleashed 3.0
Social Feed Manager presentation at Archives Unleashed 3.0Justin Littman
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Twitter Analysis: Fake News
Twitter Analysis: Fake  NewsTwitter Analysis: Fake  News
Twitter Analysis: Fake NewsErika Siregar
 
Good News/ Bad News
Good News/ Bad NewsGood News/ Bad News
Good News/ Bad NewsLulwahMA
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)Max L. Wilson
 

Destaque (17)

Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesImpact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
JCDL2015: How Well are Arabic Websites Archived?
JCDL2015: How Well are Arabic Websites Archived?JCDL2015: How Well are Arabic Websites Archived?
JCDL2015: How Well are Arabic Websites Archived?
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashed
 
Social Feed Manager presentation at Archives Unleashed 3.0
Social Feed Manager presentation at Archives Unleashed 3.0Social Feed Manager presentation at Archives Unleashed 3.0
Social Feed Manager presentation at Archives Unleashed 3.0
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Twitter Analysis: Fake News
Twitter Analysis: Fake  NewsTwitter Analysis: Fake  News
Twitter Analysis: Fake News
 
Good News/ Bad News
Good News/ Bad NewsGood News/ Bad News
Good News/ Bad News
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)
 

Semelhante a TPDL 2015 - Profiling Web Archives

JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and PotentialDaniel Gomes
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionKay Gregg
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013Ahmed AlSum
 
Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013Ahmed AlSum
 
Hub and Spokes Development June07
Hub and Spokes Development June07Hub and Spokes Development June07
Hub and Spokes Development June07Jane Stevenson
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanentlyDirk Roorda
 

Semelhante a TPDL 2015 - Profiling Web Archives (20)

JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
The Ontario library research cloud
The Ontario library research cloudThe Ontario library research cloud
The Ontario library research cloud
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and Potential
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013
 
Hub and Spokes Development June07
Hub and Spokes Development June07Hub and Spokes Development June07
Hub and Spokes Development June07
 
Rda update
Rda updateRda update
Rda update
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
oai-2.0-adv.ppt
oai-2.0-adv.pptoai-2.0-adv.ppt
oai-2.0-adv.ppt
 
Kx brian
Kx brianKx brian
Kx brian
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
 
IIPC GA 2014 Solr
IIPC GA 2014 SolrIIPC GA 2014 Solr
IIPC GA 2014 Solr
 

Mais de Sawood Alam

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesSawood Alam
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsSawood Alam
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingSawood Alam
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSawood Alam
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSawood Alam
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File FormatSawood Alam
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationSawood Alam
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesSawood Alam
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Sawood Alam
 

Mais de Sawood Alam (20)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web Packaging
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web Archives
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015
 

Último

Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdfShreedeep Rayamajhi
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxnaveenithkrishnan
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024Jan Löffler
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Shubham Pant
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfmchristianalwyn
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSlesteraporado16
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilitiesalihassaah1994
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSedrianrheine
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteMavein
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsRoxana Stingu
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpressssuser166378
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
 

Último (12)

Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptx
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilities
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a Website
 
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpress
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
 

TPDL 2015 - Profiling Web Archives