5. Various Conversations
Main points of discussion:
RM is bad / old / Australian / has power over politicians / owns newspapers
RM does / doesn’t understand the internet
Free content is good / bad
The joke belongs to Tim Vine or Stuart Francis
Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack
6. The Problem
Can we somehow structure this data so we can read it
and add to it at the most relevant point?
8. Argumentation
A participant makes a claim that represents their position
The participant backs up that claim with evidence
A counter claim challenges the position
The composer of the original claim may evaluate their position.
11. Methodology*
* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)
1. Identify discussions on different topics
2. Identify spans of text that represent the core points in the discussion
3. Classify into a structure so as to define the relationships between spans of text
4. Present this information to users
12. Data Sets
Hand annotated corpus of tweets from the London Riots (7729)
www.analysingsocialmedia.org
Comments from the Guardian newspaper (partially hand annotated for topic)
Tweets with the #OR2012 (5416)
13. • Extract individual discussion
• Unsupervised clustering – very objective
• Selection of algorithm
Unigram / Bigram Frequency
Incremental Clustering
K-means
Topic modelling
Possible tools
NLTK (nltk.org)
Weka (www.cs.waikato.ac.nz/ml/weka/)
Mallet (mallet.cs.umass.edu)
Twitter Workbench (www.analysingsocialmedia.org/projects)
1. Topic Identification
15. Are you doing what a human would do?
Results for comments data:
Evaluation
16. 2. Text Span Identification
Define a set of rules that allows the extraction of macro level argumentation
Annotated text you can use machine learning
Non-annotated you can define rules – is there something specific in the
language that indicates claim / counter claim
Claim
Counter Claim
17. Rules production
Method:
Rules are a generalisation from a large amount of data (14000 quotes)
Use Words / POS / Negation / Symbols
Use the rules to find this patterns where not explicitly mentioned in text
Examples:
– Before:
• @USERNAME:
– After:
• i don't
• i think you
• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)
– Both
• START X i 'm not
Tools:
LTT- TTT2 www.ltg.ed.ac.uk/software/
18. 3. Classify into a structure
Method
Based on Rose et al. (2008)
Use supervised machine learning to classify tweets into an argument structure
Using TagHelper tool kit (based on Weka)
– www.cs.cmu.edu/~cprose/TagHelper.html
– LightSide lightsidelabs.com
– Decide on a machine learning algorithm
– Define feature sets
– Train and test
19. Data Set Tweets
Coded with the classification system:
1. Claim without evidence
2. Claim with evidence
3. Counter-claim without evidence
4. Counter-claim with evidence
5. Implicit request for verification
6. Explicit request for verification
7. Comment
8. Other
20. Classification – Feature Selection
Features
Unigrams
+ line length
+ POS Bigrams
+ bigrams
+ punctuation
+ stemming
+ no stemming
+ rare words
+ line length, punctuation and rare words
+ no stop list
Algorithms
Support Vector Machine
Decision Tree
Naive Bayes