SlideShare uma empresa Scribd logo
1 de 28
Music Video
 Redundancy and Half-Life
       in YouTube

Matthias Prellwitz and Michael L. Nelson
  matthias.prellwitz@googlemail.com
           mln@cs.odu.edu



                                             TPDL 2011
                                       Berlin, Germany
                                                9/26/11
Linking to a particular copy
“Rolling Stones - Satisfaction”




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          2
Metadata lost
when YouTube video disappears

                                      video title        The Rolling Stones – Satisfaction
                                      url                http://www.youtube.com/watch?v=214szPQBUYc




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                  3
Metadata hard to recover from
Search Engines




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          4
But nearly 300 copies remain in YouTube




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          5
Linking music-related URIs

‣ Transparent URI semantics
  ‣ http://www.last.fm/music/John+Lennon/_/Imagine
  ‣ http://www.ilike.com/artist/The+Cribs/track/I%27m+A+Realist
  ‣ http://www.last.fm/music/Johnny+Cash/_/Highwayman
‣ Opaque URI semantics
  ‣ http://vids.myspace.com/index.cfm?fuseaction=vids.individual&videoid=5168491
  ‣ http://www.youtube.com/watch?v=VST2KKIYn50




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                               6
Popular Music
US T 40 Singles Charts of 9/25/10
    op




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          7
Popular Music
Selected Music Blogs




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          8
Popular Music
The 500 Greatest Songs of all Time




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          9
Total Result Size Range
US T 40 Singles Charts of 9/25/10
    op

                                                              123,239
                                                                      Lady Gaga
                                                               83,298
                                                                      Alejandro
                                                               43,945




                                                                  66 Selena Gomez & The
                                                                  35 Scene
                                                                  26 A Year Without Rain




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                       10
Total Result Size Range
Selected Music Blogs

                                                              264,753
                                                                      Lady Gaga
                                                              256,205
                                                                      Bad Romance
                                                              232,936




                                                                     Mariah Carey featuring
                                                                   0
                                                                     Juelz Santana & Bone
                                                                   0
                                                                     Thugs-n-Harmony
                                                                   0
                                                                     Don't Forget About Us




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                          11
Total Result Size Range
The 500 Greatest Songs of all Time

                                                              174,088
                                                                      Michael Jackson
                                                              162,937
                                                                      Billie Jean
                                                              145,076




                                                                    0
                                                                      The Isley Brothers
                                                                    0
                                                                      That Lady (Part 1 and 2)
                                                                    0




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                             12
URI Unavailability
Rooted from a selected collection




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          13
URI Unavailability
Expected Half-life




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          14
URI Publication and Removal Rate




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          15
Lifetimes of unavailable videos




             Years




            Month


TPDL 2011     Music Video Redundancy and Half-Life in YouTube
9/26/11       Matthias Prellwitz and Michael L. Nelson          16
Reasons for no unavailable videos




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          17
When a YouTube video disappears

‣ video title The Rolling Stones - Satisfaction
‣ url         http://www.youtube.com/watch?v=214szPQBUYc
‣ Published 2009-06-13 13:44                 Removed 2010-04-09 (300 days online)




                                                              HTTP/1.1 404 Not FoundContent-Type:
                                                              text/html; charset=utf-8




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                18
Metadata purged from YouTube Databases

‣ Video feed
     curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc"

     HTTP/1.1 404 Not Found
     Content-Type: text/html; charset=UTF-8
     Private video


‣ Related videos
     curl -I
     "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/related"
     HTTP/1.1 404 Not Found
     Content-Type: text/html; charset=UTF-8
     Parent Video not found


‣ Video comments
     curl -I
     "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/comments"
     HTTP/1.1 200 OK Content-Type: application/atom+xml;
     charset=UTF-8
TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson               19
Metadata Normalization




                                                              Dereferencing ASIN via amazon.com Webservice:
                                                              Artist: Michael Jackson
                                                              Title:  Billie Jean (Single Version)




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                          20
Availability of music-related metadata

‣ parsed out only at the first time a URI showed up in the result list for the first time
  ‣ YouTube crawling restrictions




‣ Remaining portion
  ‣ query video title against music related services via search engines
    ‣ Google/Yahoo! with site parameter www.last.fm/music


TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                        21
Retrieving and preserving
a video’s metadata

‣ Active preservation
     attempt once a video copy is available
     ‣ Parse HTML out for structured music-related metadata
       ‣ YouTube generated meta data
       ‣ AmazonMP3 affiliate link
       ‣ search engines with free-form video title against music-related websites

‣ Preserving metadata into the public web infrastructure




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                22
Preservation Prototype




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          23
Metadata preservation
Example: twitter




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson          24
Pointing to a Resolver service

‣ http://ytresolve.cs.odu.edu/r/http://www.youtube.com/watch?
     v=214szPQBUYc/


‣ Author-side approach
  ‣ content creator points directly to a resolver service
‣ Server-side approach
  ‣ Plugin/Renderer class automatically rewrites
         YouTube video watch URIs
         to resolver service
‣    Client-side approach
     ‣ Web-Browser plugin intercepts click on
            Youtube video watch URIs
            and redirects to resolver service




TPDL 2011       Music Video Redundancy and Half-Life in YouTube
9/26/11         Matthias Prellwitz and Michael L. Nelson          25
YouTube Resolver service

      http://www.youtube.com/watch?v=214szPQBUYc
      http://www.youtube.com/v/214szPQBUYc
      h
      http://www.youtu.be/214szPQBUYc
      http://www.youtube.com/user/WEASELxLOVER#p/a/u/2/214szPQBUYc



                                                                  HTTP/1.1 200 OK
       HTTP/1.1 404 Not Found                                     HTTP/1.1 303 See Others *
                                                        HTTP        redirect
                                                       Status
                                                        Code
search for preserved metadata
‣ in list of designated accounts

exact best available granularity

  query YouTube API with those                           *)
                                                         http://www.youtube.com/verify_controversy...
      Provided (and evaluate)                            http://www.youtube.com/verify_age...
                                                         https://www.google.com/accounts/ServiceLogin...
         alternative copies
                                                         http://www.youtube.com/das_captcha..


TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                                 26
Future Work

‣ Evaluation of preservation and retrieval quality of chosen services
  ‣ exchange services
‣ additional automation of preservation process
  ‣ once YT URI was passed for resolving
‣ Evaluation of retrieved available copies
  ‣ redirect to best copy instead of returning a list to choose
‣ Consider international requesters
  ‣ taking requester’s location (country) into account




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                    27
Summary

     ‣ Pointing to a specific YouTube video copy by its URI has a risk of disappearance
       ‣ alternative copies over time available
       ‣ YouTube URIs unlikely to be cached once gone
       ‣ YouTube metadata only reliable for available URIs
          ‣ active preservation attempt

     ‣ Introducing a level of indirection: Resolver service
       ‣ check URI status and location header
       ‣ search the public web for injected metadata
          ‣ query for alternative copies




TPDL 2011   Music Video Redundancy and Half-Life in YouTube
9/26/11     Matthias Prellwitz and Michael L. Nelson                                      28

Mais conteúdo relacionado

Destaque

(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web PagesMichael Nelson
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready WebMichael Nelson
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMichael Nelson
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the WebMichael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange ProjectMichael Nelson
 

Destaque (8)

(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

Mais de Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 

Mais de Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Music Video Redundancy and Half-Life in YouTube

  • 1. Music Video Redundancy and Half-Life in YouTube Matthias Prellwitz and Michael L. Nelson matthias.prellwitz@googlemail.com mln@cs.odu.edu TPDL 2011 Berlin, Germany 9/26/11
  • 2. Linking to a particular copy “Rolling Stones - Satisfaction” TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 2
  • 3. Metadata lost when YouTube video disappears video title The Rolling Stones – Satisfaction url http://www.youtube.com/watch?v=214szPQBUYc TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 3
  • 4. Metadata hard to recover from Search Engines TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 4
  • 5. But nearly 300 copies remain in YouTube TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 5
  • 6. Linking music-related URIs ‣ Transparent URI semantics ‣ http://www.last.fm/music/John+Lennon/_/Imagine ‣ http://www.ilike.com/artist/The+Cribs/track/I%27m+A+Realist ‣ http://www.last.fm/music/Johnny+Cash/_/Highwayman ‣ Opaque URI semantics ‣ http://vids.myspace.com/index.cfm?fuseaction=vids.individual&videoid=5168491 ‣ http://www.youtube.com/watch?v=VST2KKIYn50 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 6
  • 7. Popular Music US T 40 Singles Charts of 9/25/10 op TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 7
  • 8. Popular Music Selected Music Blogs TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 8
  • 9. Popular Music The 500 Greatest Songs of all Time TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 9
  • 10. Total Result Size Range US T 40 Singles Charts of 9/25/10 op 123,239 Lady Gaga 83,298 Alejandro 43,945 66 Selena Gomez & The 35 Scene 26 A Year Without Rain TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 10
  • 11. Total Result Size Range Selected Music Blogs 264,753 Lady Gaga 256,205 Bad Romance 232,936 Mariah Carey featuring 0 Juelz Santana & Bone 0 Thugs-n-Harmony 0 Don't Forget About Us TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 11
  • 12. Total Result Size Range The 500 Greatest Songs of all Time 174,088 Michael Jackson 162,937 Billie Jean 145,076 0 The Isley Brothers 0 That Lady (Part 1 and 2) 0 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 12
  • 13. URI Unavailability Rooted from a selected collection TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 13
  • 14. URI Unavailability Expected Half-life TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 14
  • 15. URI Publication and Removal Rate TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 15
  • 16. Lifetimes of unavailable videos Years Month TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 16
  • 17. Reasons for no unavailable videos TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 17
  • 18. When a YouTube video disappears ‣ video title The Rolling Stones - Satisfaction ‣ url http://www.youtube.com/watch?v=214szPQBUYc ‣ Published 2009-06-13 13:44 Removed 2010-04-09 (300 days online) HTTP/1.1 404 Not FoundContent-Type: text/html; charset=utf-8 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 18
  • 19. Metadata purged from YouTube Databases ‣ Video feed curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc" HTTP/1.1 404 Not Found Content-Type: text/html; charset=UTF-8 Private video ‣ Related videos curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/related" HTTP/1.1 404 Not Found Content-Type: text/html; charset=UTF-8 Parent Video not found ‣ Video comments curl -I "http://gdata.youtube.com/feeds/api/videos/214szPQBUYc/comments" HTTP/1.1 200 OK Content-Type: application/atom+xml; charset=UTF-8 TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 19
  • 20. Metadata Normalization Dereferencing ASIN via amazon.com Webservice: Artist: Michael Jackson Title: Billie Jean (Single Version) TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 20
  • 21. Availability of music-related metadata ‣ parsed out only at the first time a URI showed up in the result list for the first time ‣ YouTube crawling restrictions ‣ Remaining portion ‣ query video title against music related services via search engines ‣ Google/Yahoo! with site parameter www.last.fm/music TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 21
  • 22. Retrieving and preserving a video’s metadata ‣ Active preservation attempt once a video copy is available ‣ Parse HTML out for structured music-related metadata ‣ YouTube generated meta data ‣ AmazonMP3 affiliate link ‣ search engines with free-form video title against music-related websites ‣ Preserving metadata into the public web infrastructure TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 22
  • 23. Preservation Prototype TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 23
  • 24. Metadata preservation Example: twitter TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 24
  • 25. Pointing to a Resolver service ‣ http://ytresolve.cs.odu.edu/r/http://www.youtube.com/watch? v=214szPQBUYc/ ‣ Author-side approach ‣ content creator points directly to a resolver service ‣ Server-side approach ‣ Plugin/Renderer class automatically rewrites YouTube video watch URIs to resolver service ‣ Client-side approach ‣ Web-Browser plugin intercepts click on Youtube video watch URIs and redirects to resolver service TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 25
  • 26. YouTube Resolver service http://www.youtube.com/watch?v=214szPQBUYc http://www.youtube.com/v/214szPQBUYc h http://www.youtu.be/214szPQBUYc http://www.youtube.com/user/WEASELxLOVER#p/a/u/2/214szPQBUYc HTTP/1.1 200 OK HTTP/1.1 404 Not Found HTTP/1.1 303 See Others * HTTP redirect Status Code search for preserved metadata ‣ in list of designated accounts exact best available granularity query YouTube API with those *) http://www.youtube.com/verify_controversy... Provided (and evaluate) http://www.youtube.com/verify_age... https://www.google.com/accounts/ServiceLogin... alternative copies http://www.youtube.com/das_captcha.. TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 26
  • 27. Future Work ‣ Evaluation of preservation and retrieval quality of chosen services ‣ exchange services ‣ additional automation of preservation process ‣ once YT URI was passed for resolving ‣ Evaluation of retrieved available copies ‣ redirect to best copy instead of returning a list to choose ‣ Consider international requesters ‣ taking requester’s location (country) into account TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 27
  • 28. Summary ‣ Pointing to a specific YouTube video copy by its URI has a risk of disappearance ‣ alternative copies over time available ‣ YouTube URIs unlikely to be cached once gone ‣ YouTube metadata only reliable for available URIs ‣ active preservation attempt ‣ Introducing a level of indirection: Resolver service ‣ check URI status and location header ‣ search the public web for injected metadata ‣ query for alternative copies TPDL 2011 Music Video Redundancy and Half-Life in YouTube 9/26/11 Matthias Prellwitz and Michael L. Nelson 28