Boost PC performance: How more available memory can improve productivity
What makes a tweet relevant for a topic?
1. What makes a tweet relevant for a topic?
#MSM2012 Lyon
April 16th, 2012
Ke Tao, Fabian Abel, Claudia Hauff, Geert-Jan Houben
Web Information Systems, TU Delft
What makes a tweet relevant for a topic? 1
2. Get information from Twitter
How do people use Twitter as a source of information?
• Twitter is more like a news media.
• How do people search Twitter?
• Search on Twitter (Teevan et al.)
Web Search Twitter Search
Query length (chars) 18.80 12.00
Query length (words) 3.08 1.64
Is a celebrity name 3.11% 15.22%
What makes a tweet relevant for a topic? 2
3. Research Questions
What are the challenges we are facing?
1. Given a topic, can we identify the relevant tweets
based on the characteristics of the tweets?
2. Are semantics meaningful for determining the tweets’
relevance for a topic?
What makes a tweet relevant for a topic? 3
4. Search on Twitter
Traditional solution
• Twitter search interface
• Ordered by time
• Keyword-based match
What makes a tweet relevant for a topic? 4
5. Syntactical feature : Hashtag
Is a tweet more relevant if it contains a #hashtag?
Hypothesis 1: tweets that contain hashtags
are more likely to be relevant than tweets
that do not contain hashtags.
What makes a tweet relevant for a topic? 5
6. Syntactical feature : hasURL
Is a tweet that contains a URL more relevant?
Hypothesis 2: tweets that contain a URL are
more likely to be relevant than tweets that
do not contain a URL.
What makes a tweet relevant for a topic? 6
7. Syntactical feature : isReply
Is a tweet which is a reply to @somebody more relevant?
Hypothesis 3: tweets that are formulated as
a reply to another tweet are less likely to be
relevant than other tweets.
What makes a tweet relevant for a topic? 7
8. Syntactical feature : length
Does the length of a tweet influence its relevance for a topic?
Hypothesis 4: the longer a tweet, the more
likely it is to be relevant and interesting.
What makes a tweet relevant for a topic? 8
9. Overview of features
Short summary
Topic sensitive Topic insensitive
Keyword-based
Syntactical features
relevance
Are there further features that
allow for estimating the relevance?
What makes a tweet relevant for a topic? 9
10. Semantic features
Find semantics in a tweet to estimate the relevance
dbp:Tim_Berners-Lee dbp:World_Wide_Web
dbp:France
dbp:Lyon
dbp:International_World_Wide_Web_Conference
What makes a tweet relevant for a topic? 10
11. Semantic features : #entity
Is a tweet with more entities more interesting?
•5 entities extracted.
Hypothesis 5: the more entities a tweet
mentions, the more likely it is to be relevant
and interesting.
What makes a tweet relevant for a topic? 11
12. Semantic features : #entity(type)
Do the types of semantics influence the relevance?
•Person : 1
•Artifact : 1
•Event : 1
•Location : 2
Hypothesis 6: different types of entities are
of different importance for estimating the
relevance of a tweet.
What makes a tweet relevant for a topic? 12
13. Semantic features : diversity
How many types are there in the entities?
•4 types of entities
Hypothesis 7: the greater the diversity of
concepts mentioned in a tweet, the more
likely it is to be interesting and relevant.
What makes a tweet relevant for a topic? 13
14. Semantic features : sentiment
Was the author of the tweet happy or not?
•Sentiment : Neutral
Hypothesis 8: the likelihood of a tweet’s
relevance is influenced by its sentiment
polarity.
What makes a tweet relevant for a topic? 14
15. Overview of features
What do we have now?
Topic sensitive Topic insensitive
Keyword-based Syntactical
? Semantics
Semantics in the given topic?
What makes a tweet relevant for a topic? 15
16. Semantic-based relevance
Expand the queries to match more tweets.
dbp:Lyon
dbp:Tim_Berners-Lee
Reformulated query is expected to get a
more accurate retrieval score.
What makes a tweet relevant for a topic? 16
17. Semantic-based relatedness
Is there a semantic overlap between the query and the tweet?
dbp:Lyon
dbp:Tim_Berners-Lee
What makes a tweet relevant for a topic? 17
18. Overview of features
By now, we have 4 types of features.
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Can we utilize the contextual
information of tweets?
What makes a tweet relevant for a topic? 18
19. Topic insensitive contextual features
Does the number of posts influence the relatedness?
Hypothesis 9: the higher the number of
tweets that have been published by the
creator of a tweet, the more likely it is that
the tweet is relevant.
What makes a tweet relevant for a topic? 19
20. Topic-sensitive contextual features
Is this tweet too old?
Hypothesis 10: the lower the temporal
distance between the query time and the
creation time of a tweet, the more likely is
the tweet relevant to the topic.
T Query
March 31 April 16
What makes a tweet relevant for a topic? 20
21. Summary of Features
The features
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Contextual Contextual
What makes a tweet relevant for a topic? 21
22. Analysis
• Research Questions:
1. Which features are more influential on predicting the
relatedness of a tweet to a certain topic?
2. Which types of features are more important? Are
semantics meaningful?
3. What’s the performance that we can achieve by utilizing
these features?
• Experimental Setup
• Consider the search problem as a classification task
• Classification algorithm = Logistic Regression
What makes a tweet relevant for a topic? 22
23. Dataset
From TREC 2011 Microblog Track
• Twitter corpus
• 16 million tweets (Jan. 24th, 2011 – Feb. 8th)
• 4,766,901 tweets classified as English
• 6.2 million entity-extractions (140k distinct entities)
• Relevance judgments
• 49 topics
• 40,855 (topic, tweet) pairs
• 60.31 relevant tweets per topic (on average)
What makes a tweet relevant for a topic? 23
24. Results
Which type of features matters?
Features Precision Recall F-measure
keyword relevance 0.3040 0.2924 0.2981
semantic 0.3053 0.2931 0.2991
relevance
topic-sensitive 0.3017 0.3419 0.3206
topic-insensitive 0.1294 0.0170 0.0300
without semantics 0.3363 0.4828 0.3965
all features 0.3674 0.4736 0.4138
Overall, we can achieve the precision and
recall of over 35% and 45% respectively by
applying all the features.
What makes a tweet relevant for a topic? 24
25. Weights of features Syntactical
Which feature matters? 2
1
Keyword-based 0
2 hasHashtag hasURL isReply length
-1
1
0 Semantics
-1 Keyword-based relevance 2
1
Semantic-based 0
2 #entities diversity sentiment
-1
1
Contextual
0 2
-1 Relevance Relatedness 1
0
social context temporal context
-1
What makes a tweet relevant for a topic? 25
26. Conclusions
1. We constructed 17 features, including keyword-based
relevance, semantics-based relevance, syntactical features,
semantic features and contextual features.
2. Semantic features and topic-sensitive features are
meaningful.
3. The contextual features have little impact on the
prediction.
What makes a tweet relevant for a topic? 26
27. Future work
• We plan to leverage the implementation of search engine for
Twitter based on the work done in this paper.
• Twinder (Tweets Finder)
• The progress on this work on be found at:
http://wis.ewi.tudelft.nl/twinder/
What makes a tweet relevant for a topic? 27
28. QUESTIONS?
April 16th, 2012 Slides : http://goo.gl/fQLaQ
k.tao@tudelft.nl http://ktao.nl/
THANK YOU!
What makes a tweet relevant for a topic? 28
Editor's Notes
Greetings…Self Introduction: I’m 2nd year PhD student from Web Information Systems Group in TU Delft. Today, I will report our work titled “What makes a tweet relevant for a topic?”The general purpose : we want to supply a better experience for users who are seeking information on Twitter.
In 2010, there was a paper which suggested that Twitter is more like a news media rather than a social network. Because 85% of the tweets are related to the news. Here is a picture which shows how much tweets were posted per second when several big events occurred, such as Japanese Earthquake, U.K. Royal wedding, Raid on Osama bin Laden, and UEFA champions league. valuable source for information, particularly for news The twitter is a news media, a source for getting information…As large amount of information has been being generated on Twitter, how can we retrieve the interesting ones or the relevant one for our needs? The straightforward solution is to search, just what you do several times per day on Google. But Teevan et al. tell us that the search behavior on Twitter is quite different. For example, (i) length are shorter (ii) special expression, such as the usage of hashtag (iii) mention celebrity or name (how this differs)? check the paperColor of the table, black and white, green to highlight, list only the important rowsTherefore, we are curious how we can improve finding relevant or interesting tweets on Twitter.
RQ:Topic -> determine whether the tweets are relevant based on the characteristics in them.If we consider semantics in the tweets, will the relevance estimation be better?numberingChallenge : amount of data.. Make them explicitly
Traditional Twitter SearchHighlight what does keyword matching means, the keywords, in search query and tweets.
Title -> syntactical featuresBox in the tweetGreen boxes for the hypothesesFlow from keyword-based relevance to … Slide 5-8, flow
Subtitle -> question?
Introduction to the usage of @, including mentions, and reply. Reply tweets frequently occur in private conversations. Therefore particularly, make a hypothesis about reply tweet.
The 21st International World Wide Web Conference #www2012 will take place in Lyon, France April 16-20 2012 @www2012Lyon www2012.wwwconference.orgSubtitle questionOne short, one longcomparison
Fade in the question later.
Fade in the entities one by one.
Mention the name of the features.
Consider comparison between this and another tweet with only one type of entities.Title, subtitle
SentimentRaise a discussion -> relevant, interestingAnother example tweet could be…
First summarize. Fade in the question later.How can we extract the semantic features that are topic-sensitive.
Color of the box (query)
Not highlight www, lyon, france"How can you infer that WWW in the query refers to the Conference and not to the World Wide Web?"
18Can we utilize the contextual features.
Titles,
Timeline
Number of features.----- Meeting Notes (11/4/12 16:19) -----The temporal context -> senstiveWSDM <- Named Entities Recognition.
Numbering research questionsRefers to the paper, section 5.Explain the setup given a topic…
What is the relevance judgment?Number of topics.
ComparisonFade in the pairsHighlightTextbox -> Conclusion, (precision)
----- Meeting Notes (11/4/12 16:19) -----The indri score <- relevance, not enoughhardly visible <- point outTitle <- feature weights (influence)...
Too much textNumber of featuresLast -> future work, link to Twinder
Too much textNumber of featuresLast -> future work, link to Twinder
----- Meeting Notes (11/4/12 16:13) -----The example for explaining what do I mean by relevant or interesting. Unified the style of animation