SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Detecting, Modeling, & Predicting
    User Temporal Intention
         in Social Media
          Hany M. SalahEldeen
          Old Dominion University

        Advisor: Dr. Michael L. Nelson

       JCDL ‘12 Doctoral Consortium
Michael Jackson Dies




                   Snapshot on: June 25th 2009
http://web.archive.org/web/20090625232522/http://www.cnn.com/
Jeff tweets about it…




          Published on: June 25th 2009
https://twitter.com/mdnitehk/status/2333993907
Jenny is off the grid
Jeff’s friend Jenny was on a vacation in Hawaii
for a month…
Jenny starts catching up a month later




                                             Read on: July26th 2009


When she came back she checked Jeff’s tweets and was
shocked!
          https://twitter.com/mdnitehk/status/2333993907
Jenny follows the link on July 26th




                     CNN page on: July 26th 2009
 http://web.archive.org/web/20090726234411/http://www.cnn.com/
Jenny is confused!
• Implication:
  – Jenny thought Jeff is making a joke about her
    favorite singer and she got mad at him


• Problem:
  – The tweet and the resource the tweet links to
    have become unsynchronized.
The Egyptian Revolution
Reading about it on Storify in
       March 2012….




     http://storify.com/maq4sure/egypts-revolution
I noticed some shared images are missing




       http://storify.com/maq4sure/egypts-revolution
Some tweets are still intact…




https://twitter.com/miss_amy_qb/status/32477898581483521
…and some lost their meaning with the
    disappearance of the images



       https://twitter.com/aishes/status/32485352102952960
                                                                Missing ?




    https://twitter.com/omar_chaaban/status/32203697597452289
The tweet remains but the shared
      image disappeared…




       http://yfrog.com/h5923xrvbqqvgzj
Cairo….we have a problem
• Implication:
  – The reader cannot understand what the author of
    the tweet meant because the image is not
    available.


• Problem:
  – The post is available but the linked resource
    (image) is completely missing.
The Anatomy of a Tweet
The Anatomy of a Tweet
                                      Author’s username
                                      Other user mention
Social
 Post                                                Tweet Body




   Interaction Publishing Shortened URL   Hash Tag
   options     timestamp to resource

                        Shared Resource
3 URIs = 3 Chances to fail
Explanation in MJ’s example
t3   t4   t5        t7   t8   t9   …   tn
t1   t2                  t6
User’s Temporal Intention
The Focus of our research                 Instrumented shortener



  Share time                  Implicit       Explicit

   Click time                 Implicit       Explicit
                                         Instrumented web client
      Out of our scope
      Purview of Facebook,                Engineering problem
      Twitter, Google, …etc
                                           Solved by providing
                                                  tools
Sometimes you want a
       previous version




                 The Correct Temporal
                      Intention

CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
Sometimes you want the
      current version




                The Correct Temporal
                     Intention


In this case the current state of the press releases page
Research Question

  Can we estimate the users’
intention at the time of posting
   and reading to predict and
maintain temporal consistency?
Research Goals
• Detect the temporal intention of the:
    1.   Author upon sharing time
    2.   The reader upon dereferencing time
• Model this intention as a function of time, nature of the resource,
   and its context.
• Predict how resources change with time and the intention behind
   sharing them to minimize inconsistency.
• Implement the prediction model to automatically preserve
   vulnerable social content that is prone to change or loss.
• Create an environment implementing this framework that
   provides a smooth temporal navigation of the social web.
Related Work
•   User’s Web Search Intention       • Persistence of shared resources
     –   A. Ashkan ECIR ’09                – M. Nelson D-Lib ‘02
     –   C. Lee AINA ‘05                   – R. Sanderson OR’11
     –   A. Loser IRSW ‘08                 – F. McCown JCDL ‘07
     –   L. Azzopardi ECIR ‘09
     –   R. Baeza-Yates SPIR‘06
     –   N. Dai HT ’11
                                      • URL Shortening
                                           – D. Antoniades WWW ’11
•   Commercial Intention
     –   Q. Guo SIGIR ’10             • Tweeting, Micro-blogging and Popularity
     –   A. Benczur AIRWeb ’07
                                           – S. Wu WWW ’11
                                           – A. Java SNA-KDD ’07
•   Sentiment Analysis
                                           – H. Kwak WWW ’10
     –   G. Mishne AAAI ‘06
     –   J. Bollen JCS ‘11
                                      •   Social Networks Growth and Evolution
•   Access to Archives
                                           – B. Meeder WWW ’11
     –   H. Van de Sompel OR‘09
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage
                                                  Current
          Analyze Contextual Intention
                                                   State

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Estimating Web Archiving Coverage
• Goal: Estimate how much of the public web is present in the public archives
  and how many copies are available?
• Action:
   – Getting 4 different datasets from 4 different sources:
          •   Search Engines Indices
          •   Bit.ly
          •   DMOZ
          •   Delicious.
• Results:                                         *




• Publications:
     – How much of the web is archived? JCDL '11
* Table Courtesy of Ahmed AlSum JCDL 2011
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Shortened URI analysis
•   Goal: Have a better understanding of URI shortening and resolving,
    understand the effect of time on this process and the correlation between
    the page’s features and characteristics, and its resolution.

•   Action:
     – Fresh Bit.lys
     – Get hourly clicklogs, rate of change, social networking spread, and other
       contextual information
     – Longitudinal study

•   Evaluation:
     – Compare results with frequency of change analysis of Cho and Garcia-
       Molina.
     – Compare results with Antoniades et al. WWW 2011.
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage
          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Estimating Loss of Shared Resources
               in Social Media
•   Goal: Estimate how much of the public web is present in the public archives
    and how many copies are available?
•   Action:
     – Sampling from 6 public events
     – Events spanning 3 years
     – Existence in the current web
     – Existence in the public archives
     – Find relation with time
•   Results:
     – After 1st year ~11% will be lost
     – After that we will continue on losing 0.02% daily
•   Publications:
     – A year after the Egyptian revolution, 10% of the social media documentation is gone.
       http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
     – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost?
       TPDL '12
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          User Intention Analysis
          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
User Intention Analysis
•   Goal: Have a better understanding of User Intention and what factors affect
    it. Also create a new testing and training set.

•   Action:
     –   Get a sample set of tweets selected at random
     –   Extract the URIs
     –   Get closest Memento
     –   Download the snapshot & current version
     –   Use Amazon’s Mechanical Turk in choosing the best version

•   Evaluation:
     – Measure cross-rater agreement and confidence.
Proposed Work
•   Data Gathering
•   Feature Extraction
•   Modeling the intention engine
•   Evaluation
•   Application: Prediction and Preservation
Possible Solution for Jenny
Possible Solution for Jenny



       The resource has changed since last time it was shared
       Do you wish to see the version the author intended or
       the current version?

                      Current Version     Intended Version
Proposed Framework


                                               Archived Version




                 Feature
                                  Classifier
                Extraction

              Example Features:                Current Version

              - Tweet Content
              - Click Logs
              - Other Tweets
              - Shared Resource
              - Timemaps
Extra Slides
Archive Shortener Application
Estimating Shared Resources Loss in Social Media
Estimating Shared Resources Loss in Social Media
My Publications
•   S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How
    much of the web is archived? In Proceedings of the 11th annual international
    ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011.

•   H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media
    content has been lost? Accepted in TPDL 2012


•   H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian
    revolution, 10% of the social media documentation is gone. http://ws-
    dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
References
•   D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short
    urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA,
    2011. ACM.
•   A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th
    European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009.
    Springer-Verlag.
•   L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In
    Proceedings DIR-2006, 2006.
•   R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and
    M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages
    98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9.
•   A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd
    international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007.
    ACM.
•   J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.
•   N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM
    conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM.
•   N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference
    companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM.
•   K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques
    coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in
    Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin /
    Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
References
•   Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd
    international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA,
    2010. ACM.
•   A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th
    WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY,
    USA, 2007. ACM.
•   H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international
    conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM.
•   C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced
    Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society.
•   A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community.
    In IRSW, 2008.
•   F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of
    the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007.
•   B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times
    in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011.
    ACM.
•   G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing
    Weblogs (AAAI-CAAW), 2006.
•   M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.
•   R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR,
    abs/1105.3459, 2011.
•   H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR,
    abs/0911.1112, 2009.
•   S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference
    on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.

Mais conteúdo relacionado

Semelhante a Estimating User Temporal Intention in Social Media

Paperprotopreso
PaperprotopresoPaperprotopreso
PaperprotopresoRschDev
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social NetworksBang Hui Lim
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Andrew Deacon
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Robert Stribley
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Robert Stribley
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchWilliam Gunn
 
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Riverside County Office of Education
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)TimelessFuture
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Robert Stribley
 
Towards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryTowards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryJack Park
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeBrian Hole
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 

Semelhante a Estimating User Temporal Intention in Social Media (20)

Paperprotopreso
PaperprotopresoPaperprotopreso
Paperprotopreso
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of Research
 
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15
 
Towards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryTowards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData Discovery
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
 
Dean R Berry The Challenges of Technology Student Project
Dean R Berry The Challenges of  Technology Student ProjectDean R Berry The Challenges of  Technology Student Project
Dean R Berry The Challenges of Technology Student Project
 
Ngsp
NgspNgsp
Ngsp
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 

Mais de heinestien

MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1heinestien
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenheinestien
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data miningheinestien
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharingheinestien
 
Carbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web ResourcesCarbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web Resourcesheinestien
 
Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012heinestien
 
Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012heinestien
 

Mais de heinestien (7)

MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data mining
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
 
Carbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web ResourcesCarbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web Resources
 
Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012
 
Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012
 

Último

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 

Último (20)

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 

Estimating User Temporal Intention in Social Media

  • 1. Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany M. SalahEldeen Old Dominion University Advisor: Dr. Michael L. Nelson JCDL ‘12 Doctoral Consortium
  • 2. Michael Jackson Dies Snapshot on: June 25th 2009 http://web.archive.org/web/20090625232522/http://www.cnn.com/
  • 3. Jeff tweets about it… Published on: June 25th 2009 https://twitter.com/mdnitehk/status/2333993907
  • 4. Jenny is off the grid Jeff’s friend Jenny was on a vacation in Hawaii for a month…
  • 5. Jenny starts catching up a month later Read on: July26th 2009 When she came back she checked Jeff’s tweets and was shocked! https://twitter.com/mdnitehk/status/2333993907
  • 6. Jenny follows the link on July 26th CNN page on: July 26th 2009 http://web.archive.org/web/20090726234411/http://www.cnn.com/
  • 7. Jenny is confused! • Implication: – Jenny thought Jeff is making a joke about her favorite singer and she got mad at him • Problem: – The tweet and the resource the tweet links to have become unsynchronized.
  • 9. Reading about it on Storify in March 2012…. http://storify.com/maq4sure/egypts-revolution
  • 10. I noticed some shared images are missing http://storify.com/maq4sure/egypts-revolution
  • 11. Some tweets are still intact… https://twitter.com/miss_amy_qb/status/32477898581483521
  • 12. …and some lost their meaning with the disappearance of the images https://twitter.com/aishes/status/32485352102952960 Missing ? https://twitter.com/omar_chaaban/status/32203697597452289
  • 13. The tweet remains but the shared image disappeared… http://yfrog.com/h5923xrvbqqvgzj
  • 14. Cairo….we have a problem • Implication: – The reader cannot understand what the author of the tweet meant because the image is not available. • Problem: – The post is available but the linked resource (image) is completely missing.
  • 15. The Anatomy of a Tweet
  • 16. The Anatomy of a Tweet Author’s username Other user mention Social Post Tweet Body Interaction Publishing Shortened URL Hash Tag options timestamp to resource Shared Resource
  • 17. 3 URIs = 3 Chances to fail
  • 19. t3 t4 t5 t7 t8 t9 … tn t1 t2 t6
  • 20. User’s Temporal Intention The Focus of our research Instrumented shortener Share time Implicit Explicit Click time Implicit Explicit Instrumented web client Out of our scope Purview of Facebook, Engineering problem Twitter, Google, …etc Solved by providing tools
  • 21. Sometimes you want a previous version The Correct Temporal Intention CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
  • 22. Sometimes you want the current version The Correct Temporal Intention In this case the current state of the press releases page
  • 23. Research Question Can we estimate the users’ intention at the time of posting and reading to predict and maintain temporal consistency?
  • 24. Research Goals • Detect the temporal intention of the: 1. Author upon sharing time 2. The reader upon dereferencing time • Model this intention as a function of time, nature of the resource, and its context. • Predict how resources change with time and the intention behind sharing them to minimize inconsistency. • Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss. • Create an environment implementing this framework that provides a smooth temporal navigation of the social web.
  • 25. Related Work • User’s Web Search Intention • Persistence of shared resources – A. Ashkan ECIR ’09 – M. Nelson D-Lib ‘02 – C. Lee AINA ‘05 – R. Sanderson OR’11 – A. Loser IRSW ‘08 – F. McCown JCDL ‘07 – L. Azzopardi ECIR ‘09 – R. Baeza-Yates SPIR‘06 – N. Dai HT ’11 • URL Shortening – D. Antoniades WWW ’11 • Commercial Intention – Q. Guo SIGIR ’10 • Tweeting, Micro-blogging and Popularity – A. Benczur AIRWeb ’07 – S. Wu WWW ’11 – A. Java SNA-KDD ’07 • Sentiment Analysis – H. Kwak WWW ’10 – G. Mishne AAAI ‘06 – J. Bollen JCS ‘11 • Social Networks Growth and Evolution • Access to Archives – B. Meeder WWW ’11 – H. Van de Sompel OR‘09
  • 26. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Current Analyze Contextual Intention State Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 27. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 28. Estimating Web Archiving Coverage • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Getting 4 different datasets from 4 different sources: • Search Engines Indices • Bit.ly • DMOZ • Delicious. • Results: * • Publications: – How much of the web is archived? JCDL '11 * Table Courtesy of Ahmed AlSum JCDL 2011
  • 29. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 30. Shortened URI analysis • Goal: Have a better understanding of URI shortening and resolving, understand the effect of time on this process and the correlation between the page’s features and characteristics, and its resolution. • Action: – Fresh Bit.lys – Get hourly clicklogs, rate of change, social networking spread, and other contextual information – Longitudinal study • Evaluation: – Compare results with frequency of change analysis of Cho and Garcia- Molina. – Compare results with Antoniades et al. WWW 2011.
  • 31. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 32. Estimating Loss of Shared Resources in Social Media • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Sampling from 6 public events – Events spanning 3 years – Existence in the current web – Existence in the public archives – Find relation with time • Results: – After 1st year ~11% will be lost – After that we will continue on losing 0.02% daily • Publications: – A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost? TPDL '12
  • 33. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage User Intention Analysis Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 34. User Intention Analysis • Goal: Have a better understanding of User Intention and what factors affect it. Also create a new testing and training set. • Action: – Get a sample set of tweets selected at random – Extract the URIs – Get closest Memento – Download the snapshot & current version – Use Amazon’s Mechanical Turk in choosing the best version • Evaluation: – Measure cross-rater agreement and confidence.
  • 35. Proposed Work • Data Gathering • Feature Extraction • Modeling the intention engine • Evaluation • Application: Prediction and Preservation
  • 37. Possible Solution for Jenny The resource has changed since last time it was shared Do you wish to see the version the author intended or the current version? Current Version Intended Version
  • 38. Proposed Framework Archived Version Feature Classifier Extraction Example Features: Current Version - Tweet Content - Click Logs - Other Tweets - Shared Resource - Timemaps
  • 39.
  • 42. Estimating Shared Resources Loss in Social Media
  • 43. Estimating Shared Resources Loss in Social Media
  • 44. My Publications • S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011. • H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? Accepted in TPDL 2012 • H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws- dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
  • 45. References • D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA, 2011. ACM. • A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009. Springer-Verlag. • L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In Proceedings DIR-2006, 2006. • R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages 98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9. • A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007. ACM. • J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010. • N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM. • N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM. • K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
  • 46. References • Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA, 2010. ACM. • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY, USA, 2007. ACM. • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM. • C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society. • A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community. In IRSW, 2008. • F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007. • B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011. ACM. • G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006. • M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002. • R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR, abs/1105.3459, 2011. • H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR, abs/0911.1112, 2009. • S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.