SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Positive words carry less
information than negative
words
D. Garcia, A. Garas and F. Schweitzer
EPJ Data Science, 2012
JClub 2014.4.23
by Kazutoshi Sasahara
Introduction
n  Is human language biased towards positive
emotion or neutral?
n  Statistical properties of word freq. and length
n  Word freq. (word rank)-1 (Zipf 1949)
n  Word freq. predicts word length as a result of a principle
of least effort
n  Word length increases with information content for
efficient communication (Piantadosi et al. 2011).
Introduction (cont.)
n  Pollyanna hypothesis (Boucher & Osgood 1969)
A universal human tendency to use evaluatively positive
words more frequently and than evaluatively negative words
in communicating.
n  Previous researches reported emotional bias but
with the lack of control
n  Problems in the use of Amazon Mechanical Turk
n  Possible biases
n  Acquiescent bias
n  Social desirability bias
n  Framing effects
Data Analysis
n  This paper examined emotional bias in three major
languages on the Internet.
n  English (56.6%), German (6.5%), Spanish (4.6%)
n  Data
n  Established lexica of affective word usage:
English:1,034, German: 2,902, Spanish: 1,034
n  Google N-gram dataset: 1012 tokens
n  Valence (v)
The degree of pleasure induced by the affective word
usage, rescaled between -1 and 1.
Data Science 2012, 1:3
atascience.com/content/1/1/3
Figure 1 Emotion word clouds with frequencies calculated from Google’s crawl. In each word cloud for
English (left), German (middle), and Spanish (right), the size of a word is proportional to its frequency of
appearance in the trillion-token Google N-gram dataset [26]. Word colors are chosen from red (negative) to
green (positive) in the valence range from psychology studies [7–9]. For the three languages, positive words
predominate on the Internet.
Results: Frequency of emotional
words
v=-1: red, v=+1: green
Exception
n  Positive words predominate on the Internet.
English German Spanish
Results: Distribution of
emotional wordsa Science 2012, 1:3 Page 5 of 12
science.com/content/1/1/3
n  The median shifts significantly
towards positive values ( 0.3).
n  95% confidence intervals
(Wilcoxon tests):
n  English: 0.257 0.032
n  German: 0.167 0.017
n  Spanish: 0.287 0.035
n  Empirical evidence of positive
bias
No control Control
al. EPJ Data Science 2012, 1:3 Pag
ww.epjdatascience.com/content/1/1/3
Table 1 Correlations between word valence and information measurements.
English German Spanish
ρ(v,f) 0.222** 0.144** 0.236**
ρ(v,I) –0.368** –0.325** –0.402**
ρ(v,I′) –0.294** –0.222** –0.311**
ρ(v,I2) –0.332** –0.301** –0.359**
ρ(v,I3) –0.313** –0.201** –0.340**
ρ(v,I4) –0.254** –0.049* –0.162**
Correlation coefficients of the valence (v), frequency f, self-information I, and information content measured for 2-grams I2, 3-
grams I3, and 4-grams I4, and with self-information I′ measured from the frequencies reported in [42–44]. Significance levels:
*p < 0.01, **p < 0.001.
ones, but this nonlinear mapping between frequency and self-information makes the latter
more closely related to word valence than the former. The first two lines of Table  show
the Pearson’s correlation coefficient of word valence and frequency ρ(v,f ), followed by the
correlation coefficient between word valence and self-information, ρ(v,I). For all three
languages, the absolute value of the correlation coefficient with I is larger than with f ,
showing that self-information provides more knowledge about word valence than plain
Results: Relation between
information and valence (1)
n  Information content is measured by self-information I(w),
which provides more knowledge on the valence than the frequency.
n  Negative correlation between v and I:
n  Positive words carry less information than negative words.
n  Correlation coefficient becomes smaller for the larger context
(N).
I(w) = −log2 P(w)
←Control analysis
Data Science 2012, 1:3 Page 7 of 12
jdatascience.com/content/1/1/3
Results: Relation between
information and valence (2)
−
1
N
log2
i=1
N
∑ P(W = w |C = ci )( )
n  For all languages and
context sizes, valence
decreases with information
content.
(Left)
Color: Valence (v)
Size: Self-information (I)
(Right)
Average self-information:
Results: Additional analysis of
valence, length, and self-info (1)
n  The sign of valence matters
n  Word length I(w)
n  Valence ? (word length)-1
n  Combined influence of valence and length to I(w)
n  Additional dimension in the communication process related to
emotional content (v) rather than communication efficiency (l)
valence, which means, indeed, that the usage frequency of a word is not just related to the
overall emotional intensity, but to the positive or negative emotion expressed by the word.
Subsequently, we found that the correlation coefficient between word length and self-
information (ρ(l,I)) is positive, showing that word length increases with self-information.
These values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
Table 2 Additional correlations between valence, self-information and length.
English German Spanish
ρ(abs(v),I) 0.032
◦
0.109*** 0.135***
ρ(l,I) 0.378*** 0.143*** 0.361***
ρ(v,l) –0.044
◦
–0.071*** –0.112***
ρ(v,I|l) –0.379*** –0.319*** –0.399***
ρ(l,I|v) 0.389*** 0.126*** 0.357***
Correlation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
(I). Partial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)).
Significance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
but this trend is not so clear for German. These trends are properly quan-
rson’s correlation coefficients between valence and information content for
size (Table ). Each correlation coefficient becomes smaller for larger sizes of
as the information content estimation includes a larger context but becomes
nal analysis of valence, length and self-information
rovide additional support for our results, we tested different hypotheses im-
elation between word usage and valence. First, we calculated Pearson’s and
orrelation coefficients between the absolute value of the valence and the self-
of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients
. for German and Spanish, while they are not significant for English. The
between valence and self-information disappears if we ignore the sign of the
h means, indeed, that the usage frequency of a word is not just related to the
onal intensity, but to the positive or negative emotion expressed by the word.
tly, we found that the correlation coefficient between word length and self-
ρ(l,I)) is positive, showing that word length increases with self-information.
of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
ional correlations between valence, self-information and length.
English German Spanish
0.032
◦
0.109*** 0.135***
0.378*** 0.143*** 0.361***
–0.044
◦
–0.071*** –0.112***
–0.379*** –0.319*** –0.399***
0.389*** 0.126*** 0.357***
ients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
Additional analysis of valence, length and self-information
rder to provide additional support for our results, we tested different hypotheses im-
ting the relation between word usage and valence. First, we calculated Pearson’s and
arman’s correlation coefficients between the absolute value of the valence and the self-
rmation of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients
e around . for German and Spanish, while they are not significant for English. The
endence between valence and self-information disappears if we ignore the sign of the
nce, which means, indeed, that the usage frequency of a word is not just related to the
rall emotional intensity, but to the positive or negative emotion expressed by the word.
ubsequently, we found that the correlation coefficient between word length and self-
rmation (ρ(l,I)) is positive, showing that word length increases with self-information.
se values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
e 2 Additional correlations between valence, self-information and length.
English German Spanish
s(v),I) 0.032
◦
0.109*** 0.135***
0.378*** 0.143*** 0.361***
) –0.044
◦
–0.071*** –0.112***
|l) –0.379*** –0.319*** –0.399***
v) 0.389*** 0.126*** 0.357***
lation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
rtial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)).
ficance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
Page 9 of 12
ween valence and information content.
German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
–0.020* –0.084**
n content measured on different context sizes (I2, I3, I4) controlling
1, **p < 0.01, ***p < 0.001.
and length ρ(v,l) are very low or not significant.
f valence and length to self-information, we cal-
s ρ(v,I|l) and ρ(l,I|v). The results are shown in
e intervals of the original correlation coefficients
or the existence of an additional dimension in the
o emotional content rather than communication
wn result that word lengths adapt to information
dent semantic feature of valence. Valence is also
the symbolic representation of the word through
context by controlling for word frequency. In Ta-
fficients of valence with information content for
g for self-information. We find that most of the
ve sign, with the exception of I for English. The
probably related to two word constructions such
ents between valence and information content.
h German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
* –0.020* –0.084**
information content measured on different context sizes (I2, I3, I4) controlling
< 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
valence and length ρ(v,l) are very low or not significant.
fluence of valence and length to self-information, we cal-
oefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in
onfidence intervals of the original correlation coefficients
upport for the existence of an additional dimension in the
elated to emotional content rather than communication
the known result that word lengths adapt to information
independent semantic feature of valence. Valence is also
ut not to the symbolic representation of the word through
uence of context by controlling for word frequency. In Ta-
tion coefficients of valence with information content for
ontrolling for self-information. We find that most of the
of negative sign, with the exception of I for English. The
es of  is probably related to two word constructions such
n valence and information content.
German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
–0.020* –0.084**
tent measured on different context sizes (I2, I3, I4) controlling
p < 0.01, ***p < 0.001.
length ρ(v,l) are very low or not significant.
lence and length to self-information, we cal-
(v,I|l) and ρ(l,I|v). The results are shown in
tervals of the original correlation coefficients
he existence of an additional dimension in the
motional content rather than communication
result that word lengths adapt to information
t semantic feature of valence. Valence is also
symbolic representation of the word through
text by controlling for word frequency. In Ta-
ents of valence with information content for
or self-information. We find that most of the
sign, with the exception of I for English. The
Results: Additional analysis of
valence, length, and self-info (2)et al. EPJ Data Science 2012, 1:3 Pa
www.epjdatascience.com/content/1/1/3
Table 3 Partial correlation coefficients between valence and information content.
English German Spanish
ρ(v,I2|I) –0.034
◦
–0.100*** –0.058*
ρ(v,I3|I) –0.101** –0.070*** –0.149***
ρ(v,I4|I) –0.134*** –0.020* –0.084**
Correlation coefficients of the valence (v) and information content measured on different context sizes (I2, I3, I4) controlling
for self-information (I). Significance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
correlation coefficients between valence and length ρ(v,l) are very low or not significant.
In order to test the combined influence of valence and length to self-information, we cal-
culated the partial correlation coefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in
Table , and are within the % confidence intervals of the original correlation coefficients
ρ(v,I) and ρ(l,I). This provides support for the existence of an additional dimension in the
communication process closely related to emotional content rather than communication
efficiency. This is consistent with the known result that word lengths adapt to information
n  Most of the correlations keep significant and negative sign,
except I2 for English.
n  Knowing the possible contexts of a word (N=2,3,4) provides
further information about word valence than sole self-
information.
Summary
n  Empirical evidence for a positive bias in language
n  Positive words are more frequently used.
n  Pollyanna hypothesis
n  Facilitation of social links
n  Negative words convey more information content than
positive words.
n  Word frequency is determined by
n  Not only word length and information content
n  But also emotional content

Mais conteúdo relacionado

Destaque

Performance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songPerformance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songTokyo Tech
 
Decksetがよかった話
Decksetがよかった話Decksetがよかった話
Decksetがよかった話Kohki Miki
 
鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザインTokyo Tech
 
How sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songHow sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songTokyo Tech
 
スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk whywaita
 
Marp使ってみた
Marp使ってみたMarp使ってみた
Marp使ってみたclash m45
 
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Yutaka Kachi
 
ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用Yutaka Kachi
 
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステムJunichi Noda
 
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せWordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せSeto Takahiro
 

Destaque (16)

Performance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songPerformance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult song
 
Beamertemplete
BeamertempleteBeamertemplete
Beamertemplete
 
Decksetがよかった話
Decksetがよかった話Decksetがよかった話
Decksetがよかった話
 
鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン
 
Ets induction
Ets inductionEts induction
Ets induction
 
Erlangen Beamer
Erlangen BeamerErlangen Beamer
Erlangen Beamer
 
Gadget2
Gadget2Gadget2
Gadget2
 
How sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songHow sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird song
 
スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk
 
Marp colors
Marp colorsMarp colors
Marp colors
 
Marp Tips
Marp TipsMarp Tips
Marp Tips
 
Marp使ってみた
Marp使ってみたMarp使ってみた
Marp使ってみた
 
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
 
ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用
 
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
 
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せWordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
 

Último

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 

Último (20)

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 

Positive Words Carry Less Information Than Negative Words

  • 1. Positive words carry less information than negative words D. Garcia, A. Garas and F. Schweitzer EPJ Data Science, 2012 JClub 2014.4.23 by Kazutoshi Sasahara
  • 2. Introduction n  Is human language biased towards positive emotion or neutral? n  Statistical properties of word freq. and length n  Word freq. (word rank)-1 (Zipf 1949) n  Word freq. predicts word length as a result of a principle of least effort n  Word length increases with information content for efficient communication (Piantadosi et al. 2011).
  • 3. Introduction (cont.) n  Pollyanna hypothesis (Boucher & Osgood 1969) A universal human tendency to use evaluatively positive words more frequently and than evaluatively negative words in communicating. n  Previous researches reported emotional bias but with the lack of control n  Problems in the use of Amazon Mechanical Turk n  Possible biases n  Acquiescent bias n  Social desirability bias n  Framing effects
  • 4. Data Analysis n  This paper examined emotional bias in three major languages on the Internet. n  English (56.6%), German (6.5%), Spanish (4.6%) n  Data n  Established lexica of affective word usage: English:1,034, German: 2,902, Spanish: 1,034 n  Google N-gram dataset: 1012 tokens n  Valence (v) The degree of pleasure induced by the affective word usage, rescaled between -1 and 1.
  • 5. Data Science 2012, 1:3 atascience.com/content/1/1/3 Figure 1 Emotion word clouds with frequencies calculated from Google’s crawl. In each word cloud for English (left), German (middle), and Spanish (right), the size of a word is proportional to its frequency of appearance in the trillion-token Google N-gram dataset [26]. Word colors are chosen from red (negative) to green (positive) in the valence range from psychology studies [7–9]. For the three languages, positive words predominate on the Internet. Results: Frequency of emotional words v=-1: red, v=+1: green Exception n  Positive words predominate on the Internet. English German Spanish
  • 6. Results: Distribution of emotional wordsa Science 2012, 1:3 Page 5 of 12 science.com/content/1/1/3 n  The median shifts significantly towards positive values ( 0.3). n  95% confidence intervals (Wilcoxon tests): n  English: 0.257 0.032 n  German: 0.167 0.017 n  Spanish: 0.287 0.035 n  Empirical evidence of positive bias No control Control
  • 7. al. EPJ Data Science 2012, 1:3 Pag ww.epjdatascience.com/content/1/1/3 Table 1 Correlations between word valence and information measurements. English German Spanish ρ(v,f) 0.222** 0.144** 0.236** ρ(v,I) –0.368** –0.325** –0.402** ρ(v,I′) –0.294** –0.222** –0.311** ρ(v,I2) –0.332** –0.301** –0.359** ρ(v,I3) –0.313** –0.201** –0.340** ρ(v,I4) –0.254** –0.049* –0.162** Correlation coefficients of the valence (v), frequency f, self-information I, and information content measured for 2-grams I2, 3- grams I3, and 4-grams I4, and with self-information I′ measured from the frequencies reported in [42–44]. Significance levels: *p < 0.01, **p < 0.001. ones, but this nonlinear mapping between frequency and self-information makes the latter more closely related to word valence than the former. The first two lines of Table  show the Pearson’s correlation coefficient of word valence and frequency ρ(v,f ), followed by the correlation coefficient between word valence and self-information, ρ(v,I). For all three languages, the absolute value of the correlation coefficient with I is larger than with f , showing that self-information provides more knowledge about word valence than plain Results: Relation between information and valence (1) n  Information content is measured by self-information I(w), which provides more knowledge on the valence than the frequency. n  Negative correlation between v and I: n  Positive words carry less information than negative words. n  Correlation coefficient becomes smaller for the larger context (N). I(w) = −log2 P(w) ←Control analysis
  • 8. Data Science 2012, 1:3 Page 7 of 12 jdatascience.com/content/1/1/3 Results: Relation between information and valence (2) − 1 N log2 i=1 N ∑ P(W = w |C = ci )( ) n  For all languages and context sizes, valence decreases with information content. (Left) Color: Valence (v) Size: Self-information (I) (Right) Average self-information:
  • 9. Results: Additional analysis of valence, length, and self-info (1) n  The sign of valence matters n  Word length I(w) n  Valence ? (word length)-1 n  Combined influence of valence and length to I(w) n  Additional dimension in the communication process related to emotional content (v) rather than communication efficiency (l) valence, which means, indeed, that the usage frequency of a word is not just related to the overall emotional intensity, but to the positive or negative emotion expressed by the word. Subsequently, we found that the correlation coefficient between word length and self- information (ρ(l,I)) is positive, showing that word length increases with self-information. These values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s Table 2 Additional correlations between valence, self-information and length. English German Spanish ρ(abs(v),I) 0.032 ◦ 0.109*** 0.135*** ρ(l,I) 0.378*** 0.143*** 0.361*** ρ(v,l) –0.044 ◦ –0.071*** –0.112*** ρ(v,I|l) –0.379*** –0.319*** –0.399*** ρ(l,I|v) 0.389*** 0.126*** 0.357*** Correlation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information (I). Partial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)). Significance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. but this trend is not so clear for German. These trends are properly quan- rson’s correlation coefficients between valence and information content for size (Table ). Each correlation coefficient becomes smaller for larger sizes of as the information content estimation includes a larger context but becomes nal analysis of valence, length and self-information rovide additional support for our results, we tested different hypotheses im- elation between word usage and valence. First, we calculated Pearson’s and orrelation coefficients between the absolute value of the valence and the self- of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients . for German and Spanish, while they are not significant for English. The between valence and self-information disappears if we ignore the sign of the h means, indeed, that the usage frequency of a word is not just related to the onal intensity, but to the positive or negative emotion expressed by the word. tly, we found that the correlation coefficient between word length and self- ρ(l,I)) is positive, showing that word length increases with self-information. of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s ional correlations between valence, self-information and length. English German Spanish 0.032 ◦ 0.109*** 0.135*** 0.378*** 0.143*** 0.361*** –0.044 ◦ –0.071*** –0.112*** –0.379*** –0.319*** –0.399*** 0.389*** 0.126*** 0.357*** ients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information Additional analysis of valence, length and self-information rder to provide additional support for our results, we tested different hypotheses im- ting the relation between word usage and valence. First, we calculated Pearson’s and arman’s correlation coefficients between the absolute value of the valence and the self- rmation of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients e around . for German and Spanish, while they are not significant for English. The endence between valence and self-information disappears if we ignore the sign of the nce, which means, indeed, that the usage frequency of a word is not just related to the rall emotional intensity, but to the positive or negative emotion expressed by the word. ubsequently, we found that the correlation coefficient between word length and self- rmation (ρ(l,I)) is positive, showing that word length increases with self-information. se values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s e 2 Additional correlations between valence, self-information and length. English German Spanish s(v),I) 0.032 ◦ 0.109*** 0.135*** 0.378*** 0.143*** 0.361*** ) –0.044 ◦ –0.071*** –0.112*** |l) –0.379*** –0.319*** –0.399*** v) 0.389*** 0.126*** 0.357*** lation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information rtial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)). ficance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. Page 9 of 12 ween valence and information content. German Spanish –0.100*** –0.058* –0.070*** –0.149*** –0.020* –0.084** n content measured on different context sizes (I2, I3, I4) controlling 1, **p < 0.01, ***p < 0.001. and length ρ(v,l) are very low or not significant. f valence and length to self-information, we cal- s ρ(v,I|l) and ρ(l,I|v). The results are shown in e intervals of the original correlation coefficients or the existence of an additional dimension in the o emotional content rather than communication wn result that word lengths adapt to information dent semantic feature of valence. Valence is also the symbolic representation of the word through context by controlling for word frequency. In Ta- fficients of valence with information content for g for self-information. We find that most of the ve sign, with the exception of I for English. The probably related to two word constructions such ents between valence and information content. h German Spanish –0.100*** –0.058* –0.070*** –0.149*** * –0.020* –0.084** information content measured on different context sizes (I2, I3, I4) controlling < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. valence and length ρ(v,l) are very low or not significant. fluence of valence and length to self-information, we cal- oefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in onfidence intervals of the original correlation coefficients upport for the existence of an additional dimension in the elated to emotional content rather than communication the known result that word lengths adapt to information independent semantic feature of valence. Valence is also ut not to the symbolic representation of the word through uence of context by controlling for word frequency. In Ta- tion coefficients of valence with information content for ontrolling for self-information. We find that most of the of negative sign, with the exception of I for English. The es of  is probably related to two word constructions such n valence and information content. German Spanish –0.100*** –0.058* –0.070*** –0.149*** –0.020* –0.084** tent measured on different context sizes (I2, I3, I4) controlling p < 0.01, ***p < 0.001. length ρ(v,l) are very low or not significant. lence and length to self-information, we cal- (v,I|l) and ρ(l,I|v). The results are shown in tervals of the original correlation coefficients he existence of an additional dimension in the motional content rather than communication result that word lengths adapt to information t semantic feature of valence. Valence is also symbolic representation of the word through text by controlling for word frequency. In Ta- ents of valence with information content for or self-information. We find that most of the sign, with the exception of I for English. The
  • 10. Results: Additional analysis of valence, length, and self-info (2)et al. EPJ Data Science 2012, 1:3 Pa www.epjdatascience.com/content/1/1/3 Table 3 Partial correlation coefficients between valence and information content. English German Spanish ρ(v,I2|I) –0.034 ◦ –0.100*** –0.058* ρ(v,I3|I) –0.101** –0.070*** –0.149*** ρ(v,I4|I) –0.134*** –0.020* –0.084** Correlation coefficients of the valence (v) and information content measured on different context sizes (I2, I3, I4) controlling for self-information (I). Significance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. correlation coefficients between valence and length ρ(v,l) are very low or not significant. In order to test the combined influence of valence and length to self-information, we cal- culated the partial correlation coefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in Table , and are within the % confidence intervals of the original correlation coefficients ρ(v,I) and ρ(l,I). This provides support for the existence of an additional dimension in the communication process closely related to emotional content rather than communication efficiency. This is consistent with the known result that word lengths adapt to information n  Most of the correlations keep significant and negative sign, except I2 for English. n  Knowing the possible contexts of a word (N=2,3,4) provides further information about word valence than sole self- information.
  • 11. Summary n  Empirical evidence for a positive bias in language n  Positive words are more frequently used. n  Pollyanna hypothesis n  Facilitation of social links n  Negative words convey more information content than positive words. n  Word frequency is determined by n  Not only word length and information content n  But also emotional content