Clare llewellyn Lasiuk July 5th 2013

•Transferir como PPT, PDF•

1 gostou•1,277 visualizações

Clare Llewellyn

Using argument analysis to structure user generated content.

Tecnologia Educação

Clare Llewellyn
University of Edinburgh
Argumentation on the web - always vulgar
and often convincing?

Various Conversations
Main points of discussion:

RM is bad / old / Australian / has power over politicians / owns newspapers

RM does / doesn’t understand the internet

Free content is good / bad

The joke belongs to Tim Vine or Stuart Francis

Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack

The Problem
Can we somehow structure this data so we can read it
and add to it at the most relevant point?

Argumentation
A participant makes a claim that represents their position
The participant backs up that claim with evidence
A counter claim challenges the position
The composer of the original claim may evaluate their position.

Claim
Counter Claim
Evidence
Counter Evidence
Evaluation

Macro / Micro Argumentation
Micro-level:
Simple claim
Qualified claim
Grounded claim
Grounded and qualified claim
Non-argumentative moves
Macro-level:
Argument
Counter argument
Integration (reply)
Non-argumentative moves
Weinberger and Fischer (2006)

Methodology*
* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)
1. Identify discussions on different topics
2. Identify spans of text that represent the core points in the discussion
3. Classify into a structure so as to define the relationships between spans of text
4. Present this information to users

Data Sets
Hand annotated corpus of tweets from the London Riots (7729)
www.analysingsocialmedia.org
Comments from the Guardian newspaper (partially hand annotated for topic)
Tweets with the #OR2012 (5416)

• Extract individual discussion
• Unsupervised clustering – very objective
• Selection of algorithm
Unigram / Bigram Frequency
Incremental Clustering
K-means
Topic modelling
Possible tools
NLTK (nltk.org)
Weka (www.cs.waikato.ac.nz/ml/weka/)
Mallet (mallet.cs.umass.edu)
Twitter Workbench (www.analysingsocialmedia.org/projects)
1. Topic Identification

Example Clusters
Topic Modelling Incremental Clustering

Are you doing what a human would do?
Results for comments data:
Evaluation

2. Text Span Identification
Define a set of rules that allows the extraction of macro level argumentation
Annotated text you can use machine learning
Non-annotated you can define rules – is there something specific in the
language that indicates claim / counter claim
Claim
Counter Claim

Rules production
Method:
Rules are a generalisation from a large amount of data (14000 quotes)
Use Words / POS / Negation / Symbols
Use the rules to find this patterns where not explicitly mentioned in text
Examples:
– Before:
• @USERNAME:
– After:
• i don't
• i think you
• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)
– Both
• START X i 'm not
Tools:
LTT- TTT2 www.ltg.ed.ac.uk/software/

3. Classify into a structure
Method
Based on Rose et al. (2008)
Use supervised machine learning to classify tweets into an argument structure
Using TagHelper tool kit (based on Weka)
– www.cs.cmu.edu/~cprose/TagHelper.html
– LightSide lightsidelabs.com
– Decide on a machine learning algorithm
– Define feature sets
– Train and test

Data Set Tweets
Coded with the classification system:
1. Claim without evidence
2. Claim with evidence
3. Counter-claim without evidence
4. Counter-claim with evidence
5. Implicit request for verification
6. Explicit request for verification
7. Comment
8. Other

Classification – Feature Selection
Features
Unigrams
+ line length
+ POS Bigrams
+ bigrams
+ punctuation
+ stemming
+ no stemming
+ rare words
+ line length, punctuation and rare words
+ no stop list
Algorithms
Support Vector Machine
Decision Tree
Naive Bayes

QUESTIONS?
Clare Llewellyn
University of Edinburgh
c.a.llewellyn@sms.ed.ac.uk

Mais conteúdo relacionado

Semelhante a Clare llewellyn Lasiuk July 5th 2013

Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri

m-Assessment_Brum_DaveNDannyDavid Sugden

The Process of Qualitative Research Methodsevamaealvarado

Data Science - ExperimentsGaurav Marwaha

M-Assessment_D-NDaveDavid Sugden

Text analysis-semantic-searchDiana Maynard

Ppt feb 7 2014 ss cc research skillsprimarysource

An informatics perspective on argumentation mining - SICSA 2014-07-09jodischneider

Data Science Workshop - day 1Aseel Addawood

Dbms Cluster 4out2sea5

Hypothesis quick overview 2011-10-19dwhly

First paragraph will Executive summary about our company 100 w.docxernestc3

Towards Automatic Analysis of Online Discussions among Hong Kong StudentsCITE

3rd Workshop onSocial Information Retrieval for Technology-Enhanced Learnin...Hendrik Drachsler

Sirtel WorkshopMegaVjohnson

WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docxcockekeshia

Foundations presentation siguccs managementBeth Rugg

Coiro Online Inquiry Tool 2018Julie Coiro

E-Mail as EvidenceDan Michaluk

Watson DevCon 2016 - From Jeopardy! to the FutureIBM Watson

Semelhante a Clare llewellyn Lasiuk July 5th 2013 (20)

Natural Language Processing, Techniques, Current Trends and Applications in I...

m-Assessment_Brum_DaveNDanny

The Process of Qualitative Research Methods

Data Science - Experiments

M-Assessment_D-NDave

Text analysis-semantic-search

Ppt feb 7 2014 ss cc research skills

An informatics perspective on argumentation mining - SICSA 2014-07-09

Data Science Workshop - day 1

Dbms Cluster 4

Hypothesis quick overview 2011-10-19

First paragraph will Executive summary about our company 100 w.docx

Towards Automatic Analysis of Online Discussions among Hong Kong Students

3rd Workshop onSocial Information Retrieval for Technology-Enhanced Learnin...

Sirtel Workshop

WEEK 3 ESSAY QUESTIONS Instructions Answer all questions .docx

Foundations presentation siguccs management

Coiro Online Inquiry Tool 2018

E-Mail as Evidence

Watson DevCon 2016 - From Jeopardy! to the Future

Último

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Advanced Computer Architecture – An IntroductionDilum Bandara

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

From Family Reminiscence to Scholarly Archive .Alan Dix

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Commit 2024 - Secret Management made easyAlfredo García Lavilla

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Clare llewellyn Lasiuk July 5th 2013

1. Clare Llewellyn University of Edinburgh Argumentation on the web - always vulgar and often convincing?

2. User Generated Content

4. Various Conversations

5. Various Conversations Main points of discussion:  RM is bad / old / Australian / has power over politicians / owns newspapers  RM does / doesn’t understand the internet  Free content is good / bad  The joke belongs to Tim Vine or Stuart Francis  Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack

6. The Problem Can we somehow structure this data so we can read it and add to it at the most relevant point?

7. Solutions?

8. Argumentation A participant makes a claim that represents their position The participant backs up that claim with evidence A counter claim challenges the position The composer of the original claim may evaluate their position.

9. Claim Counter Claim Evidence Counter Evidence Evaluation

10. Macro / Micro Argumentation Micro-level: Simple claim Qualified claim Grounded claim Grounded and qualified claim Non-argumentative moves Macro-level: Argument Counter argument Integration (reply) Non-argumentative moves Weinberger and Fischer (2006)

11. Methodology* * Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011) 1. Identify discussions on different topics 2. Identify spans of text that represent the core points in the discussion 3. Classify into a structure so as to define the relationships between spans of text 4. Present this information to users

12. Data Sets Hand annotated corpus of tweets from the London Riots (7729) www.analysingsocialmedia.org Comments from the Guardian newspaper (partially hand annotated for topic) Tweets with the #OR2012 (5416)

13. • Extract individual discussion • Unsupervised clustering – very objective • Selection of algorithm Unigram / Bigram Frequency Incremental Clustering K-means Topic modelling Possible tools NLTK (nltk.org) Weka (www.cs.waikato.ac.nz/ml/weka/) Mallet (mallet.cs.umass.edu) Twitter Workbench (www.analysingsocialmedia.org/projects) 1. Topic Identification

14. Example Clusters Topic Modelling Incremental Clustering

15. Are you doing what a human would do? Results for comments data: Evaluation

16. 2. Text Span Identification Define a set of rules that allows the extraction of macro level argumentation Annotated text you can use machine learning Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim Claim Counter Claim

17. Rules production Method: Rules are a generalisation from a large amount of data (14000 quotes) Use Words / POS / Negation / Symbols Use the rules to find this patterns where not explicitly mentioned in text Examples: – Before: • @USERNAME: – After: • i don't • i think you • PRP VBP RB (Personal Pronoun, Verb singular present, Adverb) – Both • START X i 'm not Tools: LTT- TTT2 www.ltg.ed.ac.uk/software/

18. 3. Classify into a structure Method Based on Rose et al. (2008) Use supervised machine learning to classify tweets into an argument structure Using TagHelper tool kit (based on Weka) – www.cs.cmu.edu/~cprose/TagHelper.html – LightSide lightsidelabs.com – Decide on a machine learning algorithm – Define feature sets – Train and test

19. Data Set Tweets Coded with the classification system: 1. Claim without evidence 2. Claim with evidence 3. Counter-claim without evidence 4. Counter-claim with evidence 5. Implicit request for verification 6. Explicit request for verification 7. Comment 8. Other

20. Classification – Feature Selection Features Unigrams + line length + POS Bigrams + bigrams + punctuation + stemming + no stemming + rare words + line length, punctuation and rare words + no stop list Algorithms Support Vector Machine Decision Tree Naive Bayes

21. QUESTIONS? Clare Llewellyn University of Edinburgh c.a.llewellyn@sms.ed.ac.uk

Clare llewellyn Lasiuk July 5th 2013

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Clare llewellyn Lasiuk July 5th 2013

Semelhante a Clare llewellyn Lasiuk July 5th 2013 (20)

Último

Último (20)

Clare llewellyn Lasiuk July 5th 2013