This document describes research mapping tweets to conference talks as a way to gain semantic insights. It presents an approach for automatically aligning tweets with conference talks based on extracting features from tweets and talks and then inducing a labeling function to perform the alignment. The researchers constructed a gold standard dataset by having human raters label tweets from the ESWC 2010 conference and achieved an interrater agreement of 0.820. They evaluated their automatic alignment approaches based on precision, recall, and F-measure. The document challenges readers to improve upon their tweet to talk mapping approaches and find other potential uses using the provided data and gold standard.
8. Potential Benefits
• Digital memory
• Conference feedback
– number of tweets for a talk
– conversational aspects
– sentiment analysis
• User profiling and expert finding
• Trending topics
9. Rich Activity Twitter Event Data
• We take Twitter archives from
TwapperKeeper
• We enrich Tweets with relevant DBPedia
concepts using Zemanta
• We rely on existing Linked Data about talks to
perform the mappings.
10. ESWC Dataset
• Collected during the Extended Semantic Web
Conference 2010
– Any tweets tagged with “eswc”
• 1082 tweets
• 213 tweets enriched with concepts
11. Aligning Tweets with Talks
• Goal: Label tweets with talks
• Method:
– Induce a labelling function to perform alignment
– Labelled data = events from Web of Data
– Unlabelled data = tweets
( ){ }L
iii yx 1
, =
( ){ }U
iix 1=
YXf →:
12. Aligning Tweets with Talks
1. Feature Extraction:
@prefix swrc: <http://swrc.ontoware.org/ontology#>
@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>
@prefix dog: <http://data.semanticweb.org>
@prefix dc: <http://purl.org/dc/elements/1.1/>
<http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ;
dc:subject "Knowledge Acquisition" ;
dc:subject "Semantic Analysis" ;
dc:subject "Social Web" ;
dc:subject "Microblogs" ;
dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ;
swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ;
swrc:author <http://data.semanticweb.org/person/claudia-wagner> .
<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ;
foaf:name "Claudia Wagner" ;
swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ;
foaf:based_near <http://dbpedia.org/resource/Austria>
13. Aligning Tweets with Talks
1. Feature Extraction: F1 - Immediate Resource
Leaves
@prefix swrc: <http://swrc.ontoware.org/ontology#>
@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>
@prefix dog: <http://data.semanticweb.org>
@prefix dc: <http://purl.org/dc/elements/1.1/>
<http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ;
dc:subject "Knowledge Acquisition" ;
dc:subject "Semantic Analysis" ;
dc:subject "Social Web" ;
dc:subject "Microblogs" ;
dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ;
swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ;
swrc:author <http://data.semanticweb.org/person/claudia-wagner> .
<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ;
foaf:name "Claudia Wagner" ;
swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ;
foaf:based_near <http://dbpedia.org/resource/Austria>
Knowledge Acquisition Semantic
Analysis Social Web Microblogs
Exploring the Wisdom of the Tweets:
Knowledge Acquisition from Social
Awareness Streams Although one might
argue that little wisdom can be
conveyed in messages of 140
http://data.semanticweb.org/person/cla
udia-wagner
Knowledge Acquisition Semantic
Analysis Social Web Microblogs
Exploring the Wisdom of the Tweets:
Knowledge Acquisition from Social
Awareness Streams Although one might
argue that little wisdom can be
conveyed in messages of 140
http://data.semanticweb.org/person/cla
udia-wagner
14. Aligning Tweets with Talks
1. Feature Extraction: F2 – 1-step Resource
Leaves
@prefix swrc: <http://swrc.ontoware.org/ontology#>
@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>
@prefix dog: <http://data.semanticweb.org>
@prefix dc: <http://purl.org/dc/elements/1.1/>
<http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ;
dc:subject "Knowledge Acquisition" ;
dc:subject "Semantic Analysis" ;
dc:subject "Social Web" ;
dc:subject "Microblogs" ;
dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ;
swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ;
swrc:author <http://data.semanticweb.org/person/claudia-wagner> .
<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ;
foaf:name "Claudia Wagner" ;
swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ;
foaf:based_near <http://dbpedia.org/resource/Austria>
http://data.semanticweb.org/person/cla
udia-wagner Claudia Wagner
http://data.semanticweb.org/organizati
on/joanneum-research
http://dbpedia.org/resource/Austria
http://data.semanticweb.org/person/cla
udia-wagner Claudia Wagner
http://data.semanticweb.org/organizati
on/joanneum-research
http://dbpedia.org/resource/Austria
15. Aligning Tweets with Talks
1. Feature Extraction: F3 – DBPedia Concepts
@prefix swrc: <http://swrc.ontoware.org/ontology#>
@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>
@prefix dog: <http://data.semanticweb.org>
@prefix dc: <http://purl.org/dc/elements/1.1/>
<http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ;
dc:subject "Knowledge Acquisition" ;
dc:subject "Semantic Analysis" ;
dc:subject "Social Web" ;
dc:subject "Microblogs" ;
dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ;
swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ;
swrc:author <http://data.semanticweb.org/person/claudia-wagner> .
<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ;
foaf:name "Claudia Wagner" ;
swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ;
foaf:based_near <http://dbpedia.org/resource/Austria>
Http://dbpedia.org/resource/TwitterHttp://dbpedia.org/resource/Twitter
Http://dbpedia.org/resource/Social_WebHttp://dbpedia.org/resource/Social_Web
Http://dbpedia.org/resource/MicroblogsHttp://dbpedia.org/resource/Microblogs
16. Aligning Tweets with Talks
2. Feature Vector Composition
Knowledge Acquisition Semantic
Analysis Social Web Microblogs
Exploring the Wisdom of the Tweets:
Knowledge Acquisition from Social
Awareness Streams Although one might
argue that little wisdom can be
conveyed in messages of 140
http://data.semanticweb.org/person/cla
udia-wagner
Knowledge Acquisition Semantic
Analysis Social Web Microblogs
Exploring the Wisdom of the Tweets:
Knowledge Acquisition from Social
Awareness Streams Although one might
argue that little wisdom can be
conveyed in messages of 140
http://data.semanticweb.org/person/cla
udia-wagner
knowledge
acquisition
semantic
analysis
social
web
microblogs
exploring
wisdom
tweets
knowledge
acquisition
social
awareness
streams
wisdom
messages
IndexerIndexer
knowledge 2
acquisition 2
semantic 1
analysis 1
social 2
web 1
microblogs 1
exploring 1
wisdom 1
tweets 1
awareness 1
streams 1
wisdom 1
messages 1
17. Aligning Tweets with Talks
3. Inducing the Labelling Function
– Both tweets and events are provided as feature
vectors
– Induce a labelling function:
Choose the most likely event (y) given the tweet (x)
YXf →:
18. Aligning Tweets with Talks
3. Inducing the Labelling Function: Proximity-
based Clustering
– Build a centroid vector for each event
• From event feature vectors
– Compare each tweet vector with each centroid
• Choose event (y) which is closest
)),((minarg y
Yy
xdy µ
∈
=
∑=
−=
n
i
iixxmanhat
1
),( µµ ( )
2
1
),( ∑=
−=
n
i
iixxeucl µµ
19. Aligning Tweets with Talks
3. Inducing the Labelling Function: Naive Bayes
Classification
– Assigns most probably event label given tweet
features
– Using Bayes Theorem, we write this as:
),,,|( 21maxarg n
Yy
xxxyPy
∈
=
∏
∈
∈
∈
=
=
=
i
i
Yy
n
Yy
n
n
Yy
yxPyPy
yPyxxxPy
xxxP
yPyxxxP
y
)|()(
)()|,,,(
),,,(
)()|,,,(
maxarg
maxarg
maxarg
21
21
21
20. Experiments
• Dataset
– Corpus of Tweets collected during ESWC 2010
• Gold Standard Construction
– Used 3 raters to label a portion of tweet corpus
• 200 tweets labelled
– Took interrater agreement between raters
• Using Kappa statistic
– Initial Agreement was too low: 0.328
– Utilised Delphi method to improve agreement
– Second round of labelling produced: 0.820
21. Experiments
• Evaluation Measures
– Precision: proportion of event tweets correctly
labelled
– Recall: proportion of tweets successfully
returned for a tweet
– F-measure: Harmonic mean of precision and
recall
• Placed emphasis of precision over recall
RP
RP
measuref
+×
××+
=− 2
2
)1(
β
β
{ }1,5.0,2.0=β
27. We Challenge You!
• Beat us in mappings!
• We provide the human generated gold
stadnard mappings
• Can you find a more precise way to do tweet-
talk mappings?
• Can you find other uses? Let us know!
28. We Challenge You!
• you can find the gold standard data here :
http://research.hypios.com/?page_id=131
• you can find all the data (and automated
mappings) here:
http://data.hypios.com/tweets/sparql