SlideShare uma empresa Scribd logo
1 de 45
Can’t Find Your 404s? Santa Fe Complex March 13, 2009 Martin Klein, Frank McCown,  Joan Smith, Michael L. Nelson Department of Computer Science Old Dominion University Norfolk VA
Background ,[object Object],[object Object],[object Object]
“ Women and Children First” image from:  http://www.btinternet.com/~palmiped/Birkenhead.htm HMS Birkenhead, Cape Danger, 1852 638 passengers 193 survivors all 7 women & 13 children survived
We should probably save a copy of this…
Or maybe we don’t have to… the Wikipedia link is in the top 10, so we’re ok, right?
Surely we’re saving copies of this…
2 copies in  the UK 2 Dublin Core  records That’s probably good enough…
What about the  things that we know  we don’t need to keep? You  DO  support  recycling, right?
A higher moral calling for pack rats?
Just Keep the Important Stuff!
Preservation: Fortress Model ,[object Object],[object Object],[object Object],[object Object],[object Object],image from: http://www.itunisie.com/tourisme/excursion/tabarka/images/fort.jpg Five Easy Steps for Preservation:
Alternate Models  of Preservation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],image from: http://www.proex.ufes.br/arsm/knots_interlaced.htm
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The Problem
[object Object],[object Object],[object Object],[object Object],The Environment
How much of the Web is indexed? Estimates from “The Indexable Web is More than 11.5 billion pages” by Gulli and Signorini (WWW’05)
Web Infrastructure: Refreshing & Migrating
 
Timeline of Web Resource
 
 
 
Lapsed Website
JCDL 2006 http://www.jcdl2006.org/ July 2006 http://www.jcdl2006.org/ Today Scenario 1: Same URI, Same Content
Hypertext 2006 http://www.ht06.org/ August 2006 http://www.ht06.org/ Today Scenario 2: Same URI, Different Content
PSP 2003 http://www.pspcentral.org/events/annual_meeting_2003.html August 2003 http://www.pspcentral.org/events/archive/annual_meeting_2003.html Today Scenario 3a: Same Content, Different URI
ECDL 1999 http://www-rocq.inria.fr/EuroDL99/ October 1999 http://www.informatik.uni-trier.de/~ley/db/conf/ercimdl/ercimdl99.html Today Scenario 3b: Similar Content, Different URI
Greynet 1999 http://www.konbib.nl/infolev/greynet/2.5.htm 1999 Today ? ? Scenario 4: Content Not Findable At Any URI
Otto : You eat a lot of acid, Miller, back in the hippie days?  Miller : A lot o' people don't realize what's really going on.  They view life as a bunch o' unconnected incidents 'n things.  They don't realize that there's this, like, lattice o' coincidence  that lays on top o' everything. Give you an example;  show you what I mean: suppose you're thinkin' about a  plate o' shrimp. Suddenly someone'll say, like,  plate, or  shrimp, or plate o' shrimp  out of the blue, no explanation.  No point in lookin' for one, either. It's all part of a cosmic  unconsciousness.
[object Object],[object Object],[object Object],[object Object],picture from http://www.crystalinks.com/jung.html Synchronicity
The Bigger Picture Synchronicity Architecture ,[object Object],[object Object],[object Object],[object Object]
What is a Signature? (aka “message digest”, examples include “md5” and “sha-1”) image from Eddie Kohler http://www.cs.ucla.edu/~kohler/
What is a Lexical Signature? ,[object Object],[object Object],[object Object],[object Object],“ Removal Policies in Network Caches for World-Wide Web Documents” Query Google Resource Abstract REMOVAL HIT RATE PROXY CACHE LS
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],LS as Proposed by Phelps and Wilensky
Lexical Signatures -- Examples Rank/Results URL LS 1/1 http://www.cs.berkeley.edu/˜wilensky/NLP.html texttiling wilensky disambiguation subtopic iago http://www. google .com/search? q=texttiling + wilensky +disambiguation+subtopic+ iago na/10 http://www.dli2.nsf.gov nsdl multiagency imls testbeds extramural http://www. google .com/search? q=nsdl + multiagency + imls + testbeds +extramural 1/221,000 (1/174,000 in 01/2008) http://www.loc.gov library collections congress thomas american http://www. google .com/search? q=library +collections+congress+ thomas + american 1/51 (2/77 in 01/2008) http://www.jcdl2008.org libraries jcdl digital conference pst http:// www. google .com/search? q=libraries + jcdl +digital+conference+ pst
Generating LSs ,[object Object],[object Object],[object Object],[object Object]
Generating LSs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evolution Over Time ,[object Object],[object Object],300 Random URLs, winnowed to 98, 10493 observations over 12 years
Evolution Over Time -- Example 10-term LSs generated for http://www.perfect10wines.com
Evolution Over Time ,[object Object],[object Object],[object Object],Rooted Sliding
Evolution Over Time ,[object Object],[object Object],[object Object],Rooted
Evolution Over Time ,[object Object],[object Object],Sliding
Performance of LSs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance – Number of Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion & Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Necronomicon

Mais conteúdo relacionado

Mais procurados

It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
Ross Singer
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studies
Diego Valerio Camarda
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
Royal Society of Chemistry
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Duncan Hull
 

Mais procurados (20)

To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Data on the web - an inconvenient truth
Data on the web - an inconvenient truthData on the web - an inconvenient truth
Data on the web - an inconvenient truth
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studies
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
MoBeDAC - Microbiome of the Built Environment Data Analysis Core
MoBeDAC - Microbiome of the Built Environment Data Analysis CoreMoBeDAC - Microbiome of the Built Environment Data Analysis Core
MoBeDAC - Microbiome of the Built Environment Data Analysis Core
 
Semantic Web: an introduction
Semantic Web: an introductionSemantic Web: an introduction
Semantic Web: an introduction
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Version Control with GitHub for Bioinformatics
Version Control with GitHub for BioinformaticsVersion Control with GitHub for Bioinformatics
Version Control with GitHub for Bioinformatics
 

Destaque

Destaque (16)

Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
The Open Archives Initiative
The Open Archives InitiativeThe Open Archives Initiative
The Open Archives Initiative
 
Review of Web Archiving
Review of Web ArchivingReview of Web Archiving
Review of Web Archiving
 
Music Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMusic Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTube
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"A Research Agenda for "Obsolete Data or Resources"
A Research Agenda for "Obsolete Data or Resources"
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
My Point of View: Michael L. Nelson Web Archiving Cooperative
My Point of View: Michael L. Nelson  Web Archiving CooperativeMy Point of View: Michael L. Nelson  Web Archiving Cooperative
My Point of View: Michael L. Nelson Web Archiving Cooperative
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Semelhante a Can’t Find Your 404s?

Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Using the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to itUsing the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to it
Mathieu d'Aquin
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
Juan Sequeda
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
Arjen de Vries
 
WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
Juan Sequeda
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
Emanuele Della Valle
 

Semelhante a Can’t Find Your 404s? (20)

Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Using the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to itUsing the Semantic Web, and Contributing to it
Using the Semantic Web, and Contributing to it
 
Web Scale Named Entity Mining
Web Scale Named Entity MiningWeb Scale Named Entity Mining
Web Scale Named Entity Mining
 
Semantic Web: A web that is not the Web
Semantic Web: A web that is not the WebSemantic Web: A web that is not the Web
Semantic Web: A web that is not the Web
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
Finding harmony in web development
Finding harmony in web developmentFinding harmony in web development
Finding harmony in web development
 
1hr Research And Thinking
1hr Research And Thinking1hr Research And Thinking
1hr Research And Thinking
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 
Web literacypresentation2011
Web literacypresentation2011Web literacypresentation2011
Web literacypresentation2011
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
 
Semantic Web and Linked Open Data
Semantic Web and Linked Open DataSemantic Web and Linked Open Data
Semantic Web and Linked Open Data
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Linked Open Data for Archives
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for Archives
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 

Mais de Michael Nelson

Mais de Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Último (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Can’t Find Your 404s?

  • 1. Can’t Find Your 404s? Santa Fe Complex March 13, 2009 Martin Klein, Frank McCown, Joan Smith, Michael L. Nelson Department of Computer Science Old Dominion University Norfolk VA
  • 2.
  • 3. “ Women and Children First” image from: http://www.btinternet.com/~palmiped/Birkenhead.htm HMS Birkenhead, Cape Danger, 1852 638 passengers 193 survivors all 7 women & 13 children survived
  • 4. We should probably save a copy of this…
  • 5. Or maybe we don’t have to… the Wikipedia link is in the top 10, so we’re ok, right?
  • 6. Surely we’re saving copies of this…
  • 7. 2 copies in the UK 2 Dublin Core records That’s probably good enough…
  • 8. What about the things that we know we don’t need to keep? You DO support recycling, right?
  • 9. A higher moral calling for pack rats?
  • 10. Just Keep the Important Stuff!
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. How much of the Web is indexed? Estimates from “The Indexable Web is More than 11.5 billion pages” by Gulli and Signorini (WWW’05)
  • 17.  
  • 18. Timeline of Web Resource
  • 19.  
  • 20.  
  • 21.  
  • 23. JCDL 2006 http://www.jcdl2006.org/ July 2006 http://www.jcdl2006.org/ Today Scenario 1: Same URI, Same Content
  • 24. Hypertext 2006 http://www.ht06.org/ August 2006 http://www.ht06.org/ Today Scenario 2: Same URI, Different Content
  • 25. PSP 2003 http://www.pspcentral.org/events/annual_meeting_2003.html August 2003 http://www.pspcentral.org/events/archive/annual_meeting_2003.html Today Scenario 3a: Same Content, Different URI
  • 26. ECDL 1999 http://www-rocq.inria.fr/EuroDL99/ October 1999 http://www.informatik.uni-trier.de/~ley/db/conf/ercimdl/ercimdl99.html Today Scenario 3b: Similar Content, Different URI
  • 27. Greynet 1999 http://www.konbib.nl/infolev/greynet/2.5.htm 1999 Today ? ? Scenario 4: Content Not Findable At Any URI
  • 28. Otto : You eat a lot of acid, Miller, back in the hippie days? Miller : A lot o' people don't realize what's really going on. They view life as a bunch o' unconnected incidents 'n things. They don't realize that there's this, like, lattice o' coincidence that lays on top o' everything. Give you an example; show you what I mean: suppose you're thinkin' about a plate o' shrimp. Suddenly someone'll say, like, plate, or shrimp, or plate o' shrimp out of the blue, no explanation. No point in lookin' for one, either. It's all part of a cosmic unconsciousness.
  • 29.
  • 30.
  • 31. What is a Signature? (aka “message digest”, examples include “md5” and “sha-1”) image from Eddie Kohler http://www.cs.ucla.edu/~kohler/
  • 32.
  • 33.
  • 34. Lexical Signatures -- Examples Rank/Results URL LS 1/1 http://www.cs.berkeley.edu/˜wilensky/NLP.html texttiling wilensky disambiguation subtopic iago http://www. google .com/search? q=texttiling + wilensky +disambiguation+subtopic+ iago na/10 http://www.dli2.nsf.gov nsdl multiagency imls testbeds extramural http://www. google .com/search? q=nsdl + multiagency + imls + testbeds +extramural 1/221,000 (1/174,000 in 01/2008) http://www.loc.gov library collections congress thomas american http://www. google .com/search? q=library +collections+congress+ thomas + american 1/51 (2/77 in 01/2008) http://www.jcdl2008.org libraries jcdl digital conference pst http:// www. google .com/search? q=libraries + jcdl +digital+conference+ pst
  • 35.
  • 36.
  • 37.
  • 38. Evolution Over Time -- Example 10-term LSs generated for http://www.perfect10wines.com
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.