SlideShare uma empresa Scribd logo
1 de 14
Web Archiving
Profile
OverviewAhmed AlSum
PhD Candidate
Old Dominion University
Web Archiving and Digital Libraries (WADL 2013)
A Workshop at JCDL 2013
July 25-26, 2013
Indianapolis, Indiana, USA
What is the problem?
• Web Archives are blackbox, it just accessible
through textbox search (full-text or URI-lookup)
• We need to profile/characterize the web archives
around the world such as:
o Age
o Top-level domains
o Languages
o Growth rate
Why
• To optimize the query routing for Memento
Aggregator.
• To determine the missing parts of the web.
Who
Full text URI-lookup
Internet Archive x
Library of Congress x
Icelandic Web Archive x
Library and Archives Canada x x
British Library x x
UK National Library x x
Portuguese Web Archive x x
Web Archive of Catalonia x x
Croatian Web Archive x x
Archive of the Czech Web x x
National Taiwan University x x
Archive IT x x
How
• Sampling from different sources
• Retrieve the TimeMap from each archive
• Analyze the TimeMaps
URIs Samples Sources
Web
1. DMOZ – Random sample
2. DMOZ – TLD %2 of each
TLD from DMOZ (.com,
.org, .jp, etc 52 TLD)
3. DMOZ – Languages 100
URIs for each Languages (24
lang.)
Web Archives
4. Top 1-Gram from Bing
5. Top 1000 queries term
by Yahoo in 9 languages
User requests
6. IA Wayback Machine Log files
7. Memento aggregator log files
* We used hostnames only
General Coverage
Web Archive Growth Rate
TLD Sample Coverage
TLD per archive
(TLD Sample)
TLD per archive
(Fulltext search)
TLD across archives
Languages distribution
per archive
Query Routing Evaluation

Mais conteúdo relacionado

Mais procurados

03 Researchfriendly Org2
03 Researchfriendly Org203 Researchfriendly Org2
03 Researchfriendly Org2
Inria
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
Gillian Byrne
 
RDF in Hydra Summit Overview
RDF in Hydra Summit OverviewRDF in Hydra Summit Overview
RDF in Hydra Summit Overview
Karen Estlund
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
National Information Standards Organization (NISO)
 

Mais procurados (20)

FLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven LearningFLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
Rs detective afpl
Rs detective afplRs detective afpl
Rs detective afpl
 
Archivegrid
ArchivegridArchivegrid
Archivegrid
 
Library resources- CSD - 08 2016
Library resources- CSD -  08 2016Library resources- CSD -  08 2016
Library resources- CSD - 08 2016
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harder
 
03 Researchfriendly Org2
03 Researchfriendly Org203 Researchfriendly Org2
03 Researchfriendly Org2
 
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
The Open Source Library: It's Free As in Puppy
The Open Source Library: It's Free As in PuppyThe Open Source Library: It's Free As in Puppy
The Open Source Library: It's Free As in Puppy
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
 
RDF in Hydra Summit Overview
RDF in Hydra Summit OverviewRDF in Hydra Summit Overview
RDF in Hydra Summit Overview
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010
 
Bracke may4-1
Bracke may4-1Bracke may4-1
Bracke may4-1
 
Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292
 

Destaque

Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013
Martin Klein
 

Destaque (6)

Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013
 
Archiving the Mobile Web
Archiving the Mobile WebArchiving the Mobile Web
Archiving the Mobile Web
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
Word Clouds from Twitter Follower Descriptions
Word Clouds from Twitter Follower DescriptionsWord Clouds from Twitter Follower Descriptions
Word Clouds from Twitter Follower Descriptions
 
Tweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation GraphTweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation Graph
 

Semelhante a Web Archiving Profile - WADL 2013

Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
eswcsummerschool
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
librarywebchic
 

Semelhante a Web Archiving Profile - WADL 2013 (20)

Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Hub and Spokes Development June07
Hub and Spokes Development June07Hub and Spokes Development June07
Hub and Spokes Development June07
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
IIPC GA 2014 Solr
IIPC GA 2014 SolrIIPC GA 2014 Solr
IIPC GA 2014 Solr
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web Archiving
 
Easter JISC metadata May25 DT
Easter JISC metadata May25 DTEaster JISC metadata May25 DT
Easter JISC metadata May25 DT
 
Linked Open Data for Libraries, Archives, and Museums: An Aggregators View
Linked Open Data for Libraries, Archives, and Museums: An Aggregators ViewLinked Open Data for Libraries, Archives, and Museums: An Aggregators View
Linked Open Data for Libraries, Archives, and Museums: An Aggregators View
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Institutional Repository - May 2010
Institutional Repository - May 2010Institutional Repository - May 2010
Institutional Repository - May 2010
 
Reborn Digital: coding text
Reborn Digital: coding textReborn Digital: coding text
Reborn Digital: coding text
 

Mais de Ahmed AlSum (6)

Restoring US First Website
Restoring US First WebsiteRestoring US First Website
Restoring US First Website
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
Thumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web ArchivesThumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web Archives
 
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Web Archiving Profile - WADL 2013