2. Acknowledgement
• Joint
work
on
subgroup
detec5on
with
Dragomir
Radev,
Amjad
Abu
Jbara
• My
students:
Muhammad
AbdulMageed,
Pradeep
Dasigi,
Weiwei
Guo
• Collabora5ve
work
with
Owen
Rambow
and
Kathy
Mckeown,
and
their
respec5ve
groups
• Collabora5ve
sociolinguis5c
observa5ons
with
Mustafa
Mughazy
• Work
funded
by
IARPA
SCIL
program
• Several
slides
adapted
from
several
presenta5ons
where
papers
published
on
work
3. Our
Overarching
Research
Interest
• Goal:
AKempt
to
mine
social
media
text
for
clues
and
cues
toward
building
an
understanding
human
interac5ons
• How:
Iden5fy
interes5ng
sociolinguis5c
behaviors
and
correlate
them
with
linguis5c
usage
that
is
quan%fiable
and
explicitly
characterizable
as
a
diagnos%c
device
• Compare
these
devices
cross
linguis5cally
4.
Text
and
Social
Rela5ons
We
can
use
linguis5c
analysis
techniques
to
understand
the
implicit
rela5ons
that
develop
in
on-‐line
communi5es
Image
source:
clair.si.umich.edu
5.
Many
Different
Forms
of
Social
Media
• Communica5on
• Collabora5on
• Mul5media
• Reviews
&
opinions
6. Social
Media
Explosion
source:
www.internetworldstats.com
1.73
billion
Internet
users
worldwide.
75%
of
them
used
“Social
Media”
7.
Text
in
Social
Media
Some
social
media
applica5ons
are
all
about
text
8.
Text
in
Social
Media
Even
the
ones
based
on
photos,
videos,
etc.
have
a
lot
of
discussions
9.
Text
in
Social
Media
Huge
amount
of
text
exchanged
in
discussions
A
significant
treasure
trove
11. Approach
to
processing
social
construct
phenomena
• Like
any
good
scien5st
(or
imperialist):
divide
and
conquer
– Iden5fy
language
uses
(LU)
per5nent
to
the
different
social
constructs
(SC)
– Correlate
and
map
these
LUs
with
Linguis5c
Construc5ons/Cons5tuents
(LC)
13. Discover
relevant
LUs
• AKempt
to
persuade
• Agreement/disagreement
• Nega5ve/posi5ve
aetude
• Who
is
talking
about
whom
• Dialog
paKerns
• Signed
network
Do
not
depend
on
linguis%c
analysis
Rely
on
linguis%c
analysis
14. Discover
relevant
LUs
• AKempt
to
persuade
• Agreement/disagreement
• Nega5ve/posi5ve
aetude
• Who
is
talking
about
whom
• Dialog
paKerns
• Signed
network
Do
not
depend
on
linguis%c
modeling
Rely
on
linguis%c
modeling
15. LU:
AKempt
to
Persuade
• An
expression
of
opinion
(a
claim)
followed
by
explicit
jus5fica5on
of
the
claim
(an
argumenta5on)
– Persuade
to
believe,
not
persuade
to
act
– Claim:
grounding
in
experience,
commonly
respected
sources
– Argumenta5on:
evidence
and
support
from
other
discussants
CLAIM:
There
seems
to
be
a
much
beKer
list
at
the
Na5onal
Cancer
Ins5tute
than
the
one
we’ve
got.
ARGUMENTATION:
It
5es
much
beKer
to
the
actual
publica5on
(the
same
11
sec5ons,
in
the
same
order).
16. LU:
Agreement
and
Disagreement
• Examine
pairs
of
phrases
to
model
others’
acceptance
of
the
par5cipant’s
ideas
P1
by
Arcadian:
There
seems
to
be
a
much
beKer
list
at
the
Na5onal
Cancer
Ins5tute
than
the
one
we’ve
got.
It
5es
much
beKer
to
the
actual
publica5on
(the
same
11
sec5ons,
in
the
same
order).
I’d
like
to
replace
that
sec5on
in
this
ar5cle.
Any
objec5ons?
P2
by
JFW:
Not
a
problem.
Perhaps
we
can
also
insert
the
rela5ve
incidence
as
published
in
this
month’s
wiki
Blood
journal
Example
of
Agreement
• Shared
opinion
(explicit
expression),
shared
perspec5ve
(implicit
aetude)
• Using
word
similarity
and
overlap
17. LU: -ve/+ve Attitude
• The attitude of a discussant/participant in a
conversation toward another participant or topic or
entity mentioned in the thread
• Characterize –ve and +ve sentences
• Positive: praise, express liking, etc.
• You are great
• Simply elegant and beautiful
• Negative: insult, dislike, disagreement, sarcasm, etc.
• You're a liar.
• You know, you're a pretty absurd individual even by Usenet standards.
• You're just pathetic.
18. LU:
Aetude
towards
another
person
(2)
PER2:
No
it
hasn't
that's
a
bold
faced
lie.
A
definate
majority
of
Americans
support
the
public
option.
The
only
people
who
are
against
it
are
the
insurance
companies
and
moron
social
conservatives
like
you
who
don't
even
understand
what
socialism
is.
19. LU:
Aetude
towards
another
person
(2)
PER2:
No
it
hasn't
that's
a
bold
faced
lie.
A
definate
majority
of
Americans
support
the
public
option.
The
only
people
who
are
against
it
are
the
insurance
companies
and
moron
social
conservatives
like
you
who
don't
even
understand
what
socialism
is.
Using
nega5ve
and
insul5ng
language.
Sen5ment
and
word
polarity
are
the
devices
used
20. LU: Who is talking about whom
How often a person refers to, or is referred to by, other
discourse participants
Use of mentions and their frequencies
IsMyNameUsedByOthers
HaveIUsedOthersName
%OfUsersReferencedByMe
%OfUsersReferencedMe
%OfReferencesByMe
%OfReferencesToMe
ReferencesByMeToWordsRatio
users references made by me/total number of words I
wrote.
ReferencesToMeToWordsRatio
no. of references / total number of words by others
21.
LU:
Signed
Network
1
1000
2841
Par55on
the
social
medium
network
into
posi5ve
and
nega5ve
links
based
on
polarity
of
words
used
What
is
the
public
opinion
on
the
health
care
reform?
2841
posts
More
than
300K
words
23.
LU:
Signed
Network
Par5cipants
Nega5ve
Interac5on
Posi5ve
Interac5on
Very
Hot
Topic
(high
percentage
of
nega5ve
links)
24.
LU:
Signed
Network
Against
Reform
(55%)
Pro
Reform
(45%)
25. LU:
Dialog
PaKerns
• Dialog
PaKerns
are
based
on
metadata
(e.g.,
the
thread
structure),
not
the
text
– Ini5a5ve
who
started
the
thread
– Investment
share
of
par5cipa5on
– Irrelevance
how
omen
ignored
by
others
– Interjec5on
at
what
point
joined
conversa5on
– Incita5on
how
long
are
branches
started
– Inquisi5veness
the
number
of
ques5on
marks
27. Who
is
an
Influencer?
• Someone
whose
opinions/ideas
profoundly
affect
the
conversa5on
• An
influencer
may
have
the
following
characteris5cs
(Katz
and
Lazarsfeld,
1955)
– alter
the
opinions
of
their
audience
– resolve
disagreements
where
no
one
else
can
– be
recognized
by
others
as
one
who
makes
important
contribu5ons
– omen
con5nue
to
influence
a
group
even
when
not
present
– have
other
conversa5onal
par5cipants
adopt
their
ideas
and
even
the
words
they
use
to
express
their
ideas
• More
formally,
an
influencer:
– Has
credibility
in
the
group
– Persists
in
aKemp5ng
to
convince
others,
even
if
some
disagreement
occurs
– Introduces
topics/ideas
that
others
pick
up
on
or
support
29. What is Pursuit of Power?
• Individual makes repeated efforts to gain power within the
group.
• The individual attempts to control the actions or goals of the
group.
• Individual’s behavior causes tension within the group
30. Social
Construct:
Pursuit
of
Power
(PoP)
• Language
Uses
– AKempt
to
Persuade
– Agreement/disagreement
– Nega5ve/posi5ve
aetude
– Who
is
talking
about
whom
– Dialog
paKerns
(non
linguis5c)
Pursuit
of
Power
32. Social
Construct:
Subgroup
(Sub)
• Language
Uses
– Agreement/disagreement
– Nega5ve/posi5ve
aetude
– Signed
Network
(non
linguis5c)
Mul5ple
Viewpoints
(Subgroups)
33. Cross
Linguis5c
Comparison
• The
SC
in
both
languages
use
same
LUs
• But
do
Arabic
and
English
social
media
use
different
linguis5c
cons5tuents
to
show
language
use?
• A
qualita5ve
view:
34. AKempt
to
persuade
• Claims
– A
lot
more
grounding
using
religious
references
– Religion
plays
a
significant
role
in
Arabic
discourse
structure
therefore
used
to
establish
credibility
and
accordingly
influence
and
power
differen5als
• Easily
detected
using
simple
devices
such
as
explicit
diacri5za5on
– Less
subjec5ve
language
(less
usage
of
“I”
more
of
“we”,
or
exple5ves
such
as
“there,
it”)
ﺗﺘﻔﻬﻢ أن ﺣﺎول –ﻧﺤﺎول أﻧﻨﺎإﺷﻜﺎﻟﻴﺔ ﺛﺎﻧﻴﺎ .. ﻣﻌﺎﺻﺮة ﺑﻠﻐﺔ ﻣﻮﺳﻮﻋﺔ ﺑﻨﺎء ﻫﻨﺎ
صوﻗﺎ ﺑﻦ ﻋﻠﻘﻤﺔ ﺣﻴﺎة ﻓﻲ ﳑﻴﺰ ﺣﺪث ﻋﻦ ﺗﺨﺒﺮﻧﻲ أن ﳝﻜﻨﻚ ﻫﻞ ... اﳌﻠﺤﻮﻇﻴﺔ
35. Agreement/Disagreement
• Sharing
the
same
opinion
regarding
a
topic
– Explicit
agreement
• “I
agree
with
you
about
…”
ﻫﺬا ﲟﺜﻞ ﺻﻴﺎﻏﺘﻬﺎ ﻋﻠﻰ أﺷﻜﺮك ،ًﺎﲤﺎﻣ ﻓﻴﻬﺎ أواﻓﻘﻚ ﻟﻠﻐﺎﻳﺔ ﻫﺎﻣﺔ ﻧﻘﻄﺔ ﻫﺬه
حاﻟﻮﺿﻮ
أﻧﺎأواﻓﻘﻚ
ءاﻟﺒﻨﺎ ﻃﻮر ﻓﻲ ﻣﻮﺳﻮﻋﺔ أﻧﻨﺎ
– Implicit
similar
aetude
toward
a
topic
• Challenge
• Pervasive
sarcasm
• Pervasive
use
of
MWE
and
references
to
cultural
knowledge
36. Detec5ng
(dis)agreements/aetudes?
• The
role
of
idiom/metaphor/sarcasm
in
Arabic
seems
to
be
more
pervasive
– Tongue
twisters,
WiKy
language,
Puns
ﺲﻠhا ﻓﻲ واﻻدﻗﻦ ﺷﻌﺮ ﺣﻤﺰاوي •
• MP
Hamzawy
being
liberal
has
long
hair
compared
to
the
MB
candidates
who
have
beards,
so
the
bet
on
whether
he
will
grow
his
hair
longer
or
grow
a
beard
ﺔﺑﻄﻴﺨ اﷲ ﺷﺎء ﻣﺎ اﻟﺮاﺟﻞ ﻗﻠﺐ وﻟﻜﻦ واﺣﺪة ﺑﺬرة ﻳﺴﺎع ﻣﺎﳒﻪ اﳌﺮأة ﻗﻠﺐ •
• Heart
of
a
woman
is
like
a
mango
can
only
hold
one
seed,
but
a
man’s
heart
is
“God
Bless”
a
melon
– Sarcasm
!ﺑﺴﻴﻄﻪ ﻳﺎﻻ •
• no
problem,
it
is
easy!
(We
are
screwed
regardless!)
37. Nega5ve/posi5ve
Aetude
• Very
flowery
language
compared
to
English
• Strong
condescending
language
to
show
nega5ve
aetude
• Code
switching
into
dialectal
Arabic
expressions
to
show
support
– Manipulate
different
registers
for
code
switching
depending
on
context:
CA
with
MSA/DA
code
switching
to
reflect
influence
• Ben
Ali,
Tunisian
President
vs.
Mubarak,
Egyp5an
president
in
ouster
speech
• Mubarak
–
Ex-‐Egyp5an
President
on
visit
to
factories/ouster
from
posi5on
in
last
revolu5on
• Mubarak
vs.
Nasser
vs.
Sadat
– Balance
between
familiarity
and
distance
38. Nega5ve/posi5ve
Aetude
• Plural
first
person
pronouns
allow
the
speaker
to
reduce
his/her
power
to
establish
rapport
and
show
posi5ve
aetude,
– e.g.,
إﺣﻨﺎ
ﺟﺎﻟﻨﺎ
اﻟﺸﺮف
vs.
أﻧﺎ
ﺟﺎﻟﻲ
اﻟﺸﺮف
– We
are
honored
vs.
I
am
the
honored
one
• English
plural
pronouns
in
such
contexts
sound
patronizing
(the
textbook
“we”),
whereas
the
“royal
we”
is
disused.
39. Nega5ve/Posi5ve
aetude
• Humor
is
commonly
used
in
Arabic
as
a
strategy
that
levels
power
rela5ons,
but
that
would
be
inappropriate
in
English.
• Slightly
offensive
expressions
are
used
in
Arabic
to
maintain
power
balance
and
solidarity,
e.g.,
•اﺳﻜﺖ
،ﻣﺶ
ﻣﺤﻤﺪ
ﳒﺢ
•واﻟﻨﺒﻲ
ﻧﻘﻄﻨﺎ
ﺑﺴﻜﺎﺗﻚ
.
• Only
very
few
such
expressions
are
acceptable
in
English
and
in
very
close
contexts,
e.g.,
shut
up
and
get
out
of
here.
40. Talking
about
whom
and
to
whom
• More
manipula5on
of
power
differen5al
– MSA
terms
of
address
add
formality,
and
therefore
power
to
the
speaker,
whereas
colloquial
terms
of
address
establish
informal/equal
levels
of
power.
• Compare
ﻳﺎ
ﺳﻴﺪي
اﻟﻌﺰﻳﺰ
to
ﻳﺎ
ﺧﻮﻳﺎ .
• English
does
not
have
such
as
a
rich
con5nuum
of
formality/informality
expressions.
• Usage
of
expressions
such
as
– Mona:
Mona
could
not
dare
refuse
a
request
from
Ali
– Considered
strange
self
reference
in
English
but
it
is
used
as
means
of
showing
modesty
and
familiarity
41. Focus
of
this
talk
Influencers
Pursuit
of
Power
Disputed
Topics
Mul5ple
Viewpoints
(Subgroups)
42. Focus
of
this
talk
The
new
immigra5on
law
is
good.
Illegal
immigra5on
is
bad.
Peter
I
totally
disagree
with
you.
This
law
is
blatant
racism.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct.
Illegal
immigra5on
is
bad
and
must
be
stopped.
John
You
are
clueless,
Peter.
Stop
suppor5ng
racism.
Alexander
Peter
John
Support
the
new
law
Against
the
new
law
Mary
Alexander
46. 1
-‐
Thread
Parsing
The
new
immigra5on
law
is
good.
Illegal
immigra5on
is
bad.
Peter
I
totally
disagree
with
you.
This
law
is
blatant
racism.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct.
Illegal
immigra5on
is
bad
and
must
be
stopped.
John
You
are
clueless,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
Iden5fy
Posts,
Discussants,
and
the
reply
structure
of
the
discussion
thread
48. 2
-‐
Iden5fy
Opinion
Words*
The
new
immigra5on
law
is
good+.
Illegal
immigra5on
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct+.
Illegal
immigra5on
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
*Iden5fying
opinion
words
using
Opinion
Finder
with
an
extended
lexicon
(implemented
using
random
walks
–
Hassan
&
Radev,
2011)
50. 3-‐
Iden5fy
Candidate
Targets
of
Opinion
Target
Discussant
(
e.g.
you,
Peter)`
Topic/EnEty
(e.g.
The
new
immigra5on
Law,
Illegal
Immigra5on)
51. Candidate
Targets
3-‐
Iden5fy
Candidate
Targets
of
Opinion
The
new
immigra5on
law
is
good+.
Illegal
immigra5on
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct+.
Illegal
immigra5on
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
All
discussants
are
candidate
Targets
52. Candidate
Targets
3-‐
Iden5fy
Candidate
Targets
of
Opinion
The
new
immigra5on
law
is
good+.
Illegal
immigra5on
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct
+.
Illegal
immigra5on
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
D1
D1
D1
Iden5fy
discussant
men5ons
(2pp
or
name)
in
the
discussion
D2
53. Candidate
Targets
3-‐
Iden5fy
Candidate
Targets
of
Opinion
The
new
immigra5on
law
is
good+.
Illegal
immigra5on
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct
+.
Illegal
immigra5on
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
D1
D1
D1
D1
Peter
Iden5fy
anaphoric
men5ons
of
discussants
D2
54. Candidate
Targets
3-‐
Iden5fy
Candidate
Targets
of
Opinion
The
new
immigraEon
law
is
good+.
Illegal
immigraEon
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Have
you
read
all
what
Peter
wrote?
He
is
correct
+.
Illegal
immigraEon
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
D1
D1
D1
D1
Peter
Topic1
Topic1
Topic2
Topic2
D2
Topic
1
Topic
2
55. 3-‐
Iden5fy
Candidate
Targets
of
Opinion
• Techniques
used
to
iden5fy
topical
targets
:
– Named
En5ty
Recogni5on
– Noun
phrase
chunking
59. Candidate
Targets
4-‐
Opinion-‐Target
Pairing
The
new
immigraEon
law
is
good+.
Illegal
immigraEon
is
bad-‐.
Peter
I
totally
disagree-‐
with
you.
This
law
is
blatant-‐
racism-‐.
Mary
Read
all
what
Peter
wrote.
He
is
correct+.
Illegal
immigraEon
is
bad-‐
and
must
be
stopped.
John
You
are
clueless-‐,
Peter.
Stop
suppor5ng
racism.
Alexander
P1
P2
P3
P4
D1
D2
D3
D4
D1
D1
D1
D1
Peter
Topic1
Topic1
Topic2
Topic2
Topic
1
Topic
2
60. 4-‐
Opinion-‐Target
Pairing
• Language
Uses
(LUs)
present
in
this
step:
– Targeted
sen5ment
toward
other
discussants
(2nd
person)
– Targeted
Sen5ment
toward
topic
men5ons
(3rd
person)
I
totally
disagree
-‐
with
you.
This
law
is
blatant
-‐
racism
-‐.
61. 4-‐
Opinion-‐Target
Pairing
• LU
details
– Rule-‐based
detec5on
of
sen5ment
targets
(we’ve
also
been
experimen5ng
with
supervised
target
detec5on
methods)
– Discussant
targets
are
iden5fied
by
2nd
person
pronouns
(you,
your,
yourself,
etc.)
and
by
username
men5ons
(casper3912,
etc.)
68. Data
• 117
Discussions
• Short
threads
• short
posts
• Human
annota5on
• More
formal
• 12
Polls
+
Discussions
• Long
threads
• Long
and
short
posts
• Data
self-‐labeled
• Less
formal
• 30
debates
• Long
threads
• Long
and
short
posts
• Data
self-‐labeled
• Less
formal
71. Evalua5on
Metrics
2. Entropy
3. F-‐Measure
where
P(I,
j)
is
the
probability
of
finding
an
element
from
the
category
i
in
the
cluster
j,
nj
is
the
number
of
items
in
cluster
j,
and
n
the
total
number
of
items
in
the
distribu5on.
72. Wikipedia
PoliEcal
Forum
Create
debate
Purity
0.66
0.61
0.64
Entropy
0.55
0.80
0.68
F-‐measure
0.61
0.56
0.60
English
Results
73. Baselines
• Interac5on
Graph
Clustering
(GC)
– Nodes:
Par5cipants
– Edges:
interac5ons
(connect
two
par5cipants
if
they
exchange
posts)
• Text
Classifica5on
(TC)
– Build
TF-‐IDF
vectors
for
each
par5cipant
(using
all
his/
her
posts)
– Cluster
the
vector
space
75. Choice
of
Clustering
Algorithm
• K-‐means
• Expecta5on
Maximiza5on
(EM)
• Farthest
First
(FF)
76. Choice
of
Clustering
Algorithm
• K-‐means
• Expecta5on
Maximiza5on
(EM)
• Farthest
First
(FF)
77. Component
Evalua5on
Our
System
No
Topical
Targets
No
Discussant
Targets
No
SenEment
No
InteracEon
No
Anaphora
ResoluEon
No
Named
EnEty
Recog.
No
NP
Chunking
78. Component
Evalua5on
Our
System
No
Topical
Targets
No
Discussant
Targets
No
SenEment
No
InteracEon
No
Anaphora
ResoluEon
No
Named
EnEty
Recog.
No
NP
Chunking
Not really a linguistic feature
79. Component
Evalua5on
Our
System
No
Topical
Targets
No
Discussant
Targets
No
SenEment
No
InteracEon
No
Anaphora
ResoluEon
No
Named
EnEty
Recog.
No
NP
Chunking
More of a linguistic feature!
80. Deeper
look
at
Agreement/
Disagreement
and
Aetude
• So
far
we
employed
shared/divergent
opinion
in
the
form
of
explicit
polarity
indicators
– Sen5ment
polarity
towards
other
discussants
• A:
So,
no
maBer
how
much
faith
you
have,
one
of
you
MUST
be
wrong!
(negaHve)
• B:
You
are
a
scienHst?!
May
I
ask
in
which
field?
(negaHve)
– Sen5ment
polarity
towards
an
enHty
• A:
Here
is
an
excellent
verse
from
the
Bible..
(posiHve)
• B:
The
Bible
rightly
says
that...
(posiHve)
81. Implicit
Opinion/Perspec5ve
• Observa5on:
People
sharing
similar
beliefs/perspec5ve
tend
to
use
the
same
evidence
to
support
their
point
– Believers:
faith,
peace,
love,
ci5ng
verses
from
the
Bible...
– Atheists:
reason,
science,
aKack
on
perceived
logical
flaws
in
Bible...
• However
it
is
not
always
explicit
(using
similar
words
and
similar
aetudes)
• Peter:
God
is
the
creator
of
mankind
• Mary:
The
belief
in
an
ul5mate
divine
being
has
sustained
me
over
the
years
– Not
necessarily
posi5ve/nega5ve
– High
dimensional
similarity
(looking
at
the
surface
words)
between
both
sentences
is
low!
– BUT
we
know
Mary
and
Peter
share
the
same
perspecEve
and
will
tend
to
be
in
agreement
with
each
other
82. Modeling
of
implicit
agreement/
disagreement
• Implicit
agreement
or
disagreement
(perspec5ve)
–
using
text
similarity
to
help
iden5fy
subgroups
• Perspec5ve
modeling
is
used
to
complement
explicit
aetude
• Perspec5ve
granularity
has
to
be
collected
on
the
level
of
a
thread
rather
than
a
single
post
– Hence
we
summarize
all
the
posts
in
the
thread
83. Our
Model
• Explicit
high
dimensional
aetude
toward
other
discussants
• Explicit
high
dimensional
aetude
toward
named
en55es
• Model
shared
perspec5ve
among
discussants
over
threads
using
textual
similarity
on
the
post
level
in
the
latent
space
84. Extrac5ng
explicit
aetude
toward
other
discussants
• Iden5fy
polarity
of
each
sentence
• Use
the
thread
structure
of
the
discussion
to
iden5fy
the
target
discussant
• If
the
sentence
has
second
person
pronouns
(Hassan
et
al.,
2010),
then
the
polarity
is
assumed
to
be
towards
the
target
of
the
sentence
85. Extrac5ng
explicit
aetude
toward
named
en55es
• Iden5fy
polarity
of
each
sentence
• Run
Stanford
Named
En5ty
Tagger
on
sentences
• If
the
sentence
has
Named
En55es,
then
the
polarity
is
assumed
to
be
towards
those
en55es
86. Extrac5ng
implicit
perspec5ve
• Run
Latent
Dirichlet
Alloca5on
(LDA)
on
the
thread
• Extract
the
topic
distribu5on
of
each
post
• Aggregate
the
distribu5ons
of
all
posts
between
each
pair
of
discussants
87. Feature
Representa5on:
Aetude
Profiles
• Vector
Representa5on
• Explicit
aetude
toward
other
discussants
A
B
C
A
0
1
1
1 1
2
0
0
0
B
…
C
-‐-‐
88. Feature
Representa5on:
Aetude
Profiles
• Vector
Representa5on
• Explicit
aetude
toward
En55es
A
B
C
E1
E2
A
0
1
1
1 1
2
0
0
0
1
1
2
1
0
1
B
…
C
-‐-‐
89. Feature
Representa5on:
Aetude
Profiles
• Vector
Representa5on
• Implicit
aetude
toward
other
discussants
A
B
C
E1
E2
A
B
C
A
0
1
1
1 1
2
0
0
0
1
1
2
1
0
1
1
1
1
1
0
0.5
0.5
0
0
B
…
C
-‐-‐
1
1
1
90. Data
• Create
Debate
(CD)
– www.createdebate.com
– Deba5ng
on
a
certain
topic
– Sides
are
explicitly
indicated
by
discussants
in
a
poll
– Informal
language
• Wikipedia
Discussion
Forum
(WIKI)
– en.wikipedia.org
– Groups
labels
are
manually
annotated
– Formal
language,
not
much
nega5ve
polarity
91. Experimental
Condi5ons
• Clustering
algorithm
– S-‐Link
#
of
clusters
by
rule
of
thumb
=
√n/2
• Evalua5on
Metrics
– Purity,
Entropy,
F-‐measure
• Baseline
– RAND-‐BASE:
Assign
discussants
to
clusters
randomly
– SWD-‐BASE:
Calculate
surface
word
distribu5on,
as
a
simpler
form
of
perspec5ve
93. Observa5ons
CondiEon
Wiki
CD
Purity
Entropy
Fmeasure
Purity
Entropy
Fmeasure
RAND-‐BASE
0.675
0.563
0.652
0.399
0.966
0.41
SWD-‐BASE
0.772
0.475
0.646
0.452
0.932
0.432
SD
0.834
0.360
0.667
0.824
0.394
0.596
SE
0.827
0.383
0.655
0.793
0.422
0.582
SD+SE
0.835
0.362
0.665
0.82
0.385
0.604
PERS
0.853
0.321
0.699
0.787
0.399
0.589
SD+PERS
0.853
0.320
0.698
0.849
0.333
0.615
SE+PERS
0.853
0.321
0.702
0.789
0.399
0.591
SD+SE+PERS
0.857
0.310
0.703
0.861
0.315
0.625
Best
Performance
is
when
we
combine
explicit
aetude
(SD
Sen5ment
toward
other
discussants,
SE
Sen5ment
toward
En55es)
with
implicit
perspec5ve
(PERS),
regardless
of
genre
94. Observa5ons
CondiEon
Wiki
CD
Purity
Entropy
Fmeasure
Purity
Entropy
Fmeasure
RAND-‐BASE
0.675
0.563
0.652
0.399
0.966
0.41
SWD-‐BASE
0.772
0.475
0.646
0.452
0.932
0.432
SD
0.834
0.360
0.667
0.824
0.394
0.596
SE
0.827
0.383
0.655
0.793
0.422
0.582
SD+SE
0.835
0.362
0.665
0.82
0.385
0.604
PERS
0.853
0.321
0.699
0.787
0.399
0.589
SD+PERS
0.853
0.320
0.698
0.849
0.333
0.615
SE+PERS
0.853
0.321
0.702
0.789
0.399
0.591
SD+SE+PERS
0.857
0.310
0.703
0.861
0.315
0.625
WIKI
seems
to
gain
more
from
implicit
perspec5ve
compared
to
CD
Explicit
Aetude
is
a
beKer
feature
for
CD:
people
express
their
sen5ments
openly,
while
in
WIKI
people
are
more
constrained
and
subtle
in
their
expressions
95. Observa5ons
CondiEon
Wiki
CD
Purity
Entropy
Fmeasure
Purity
Entropy
Fmeasure
RAND-‐BASE
0.675
0.563
0.652
0.399
0.966
0.41
SWD-‐BASE
0.772
0.475
0.646
0.452
0.932
0.432
SD
0.834
0.360
0.667
0.824
0.394
0.596
SE
0.827
0.383
0.655
0.793
0.422
0.582
SD+SE
0.835
0.362
0.665
0.82
0.385
0.604
PERS
0.853
0.321
0.699
0.787
0.399
0.589
SD+PERS
0.853
0.320
0.698
0.849
0.333
0.615
SE+PERS
0.853
0.321
0.702
0.789
0.399
0.591
SD+SE+PERS
0.857
0.310
0.703
0.861
0.315
0.625
BeKer
results
obtained
on
the
same
data
set
from
the
previous
results
for
WIKI
(P
0.66,
E
0.55)
CD
(P
0.64,
E
0.68)
97. The
LUs
used
in
Final
System
• AKempt
to
persuade
(Inf)
• Agreement/disagreement
(Inf,
Sub)
• -‐ve/+ve
aetude
without
perspec5ve
(sub)
• Who
is
talking
about
whom
(PoP)
• Dialog
paKerns
(PoP)
• Signed
network
(Sub)
Do
not
depend
on
linguis%c
analysis
Rely
on
linguis%c
analysis
98. LUs
and
SCs
LU/SC
Influencer
Pursuit
of
Power
Subgroup
AKempt
to
Persuade
✔
Agreement/Disagreement
✔
✔
-‐ve/+ve
aetude
✔
✔
Who
is
talking
about
whom
✔
Dialogue
PaKerns
✔
Signed
Networks
✔
99. Challenges
with
processing
Arabic
Social
media
• Genre
– WikiPedia
• MSA
with
dialectal
style
and
mul5word
expressions/
lexical
items
– Blogs
from
BOLT
mostly
dialectal
with
pervasive
code
switching
and
seman5c
faux
amis
• Implica5ons
for
preprocessing
– Our
tools
are
trained
on
formal
MSA
genres
• Hence
degrada5on
in
basic
NLP
processing,
for
example
POS
tagging
in
MSA
is
97%
accuracy,
in
Blog
data
we
are
at
94%
(on
a
good
day!)
101. Formal
Gov.
Evalua5on
(nDCG%)
09/2012
En-‐WIKI
En-‐Fora
Ar-‐WIKI
Ar-‐Fora
Subgroup
(without
perspec%ve)
48.2
50.6
57.4
37.5
Influencer
82.8
78.3
85.1
84.9
Pursuit
of
Power
87.8
77.7
91.6
74.6
In
general,
Subgroup
is
the
hardest
102. Formal
Gov.
Evalua5on
(nDCG%)
09/2012
En-‐WIKI
En-‐Fora
Ar-‐WIKI
Ar-‐Fora
Subgroup
(without
perspec%ve)
48.2
50.6
57.4
37.5
Influencer
82.8
78.3
85.1
84.9
Pursuit
of
Power
87.8
77.7
91.6
74.6
In
general,
Subgroup
is
the
hardest
Pursuit
of
power
relies
mostly
on
shallow
linguis5c
features
(men5ons)
and
dialog
structure
103. Formal
Gov.
Evalua5on
(nDCG%)
09/2012
En-‐WIKI
En-‐Fora
Ar-‐WIKI
Ar-‐Fora
Subgroup
(without
perspec%ve)
48.2
50.6
57.4
37.5
Influencer
82.8
78.3
85.1
84.9
Pursuit
of
Power
87.8
77.7
91.6
74.6
Fora
are
harder
to
deal
with
than
WIKI
genre
104. Formal
Gov.
Evalua5on
(nDCG%)
09/2012
En-‐WIKI
En-‐Fora
Ar-‐WIKI
Ar-‐Fora
Subgroup
(without
perspec%ve)
48.2
50.6
57.4
37.5
Influencer
82.8
78.3
85.1
84.9
Pursuit
of
Power
87.8
77.7
91.6
74.6
Arabic
WIKI
did
beBer
than
English
WIKI
105. Formal
Gov.
Evalua5on
(nDCG%)
09/2012
En-‐WIKI
En-‐Fora
Ar-‐WIKI
Ar-‐Fora
Subgroup
(without
perspec%ve)
48.2
50.6
57.4
37.5
Influencer
82.8
78.3
85.1
84.9
Pursuit
of
Power
87.8
77.7
91.6
74.6
Arabic
Influencer
significantly
impacted
by
simple
diacriHzaHon
detecHon
for
claims
(grounding)
106. Conclusions
• We
can
successfully
computa5onally
model
sociopragma5c
phenomena
– There
is
significant
room
for
improvement
• S5ll
discovering
how
to
model
the
phenomena
in
a
more
language
specific
manner
– We
are
just
scratching
the
surface
of
understanding
the
sociopragma5c
linguis5c
features
• NOW
more
than
ever
collabora5ons
are
necessary