SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Positive words carry less
information than negative
words
D. Garcia, A. Garas and F. Schweitzer
EPJ Data Science, 2012
JClub 2014.4.23
by Kazutoshi Sasahara
Introduction
n  Is human language biased towards positive
emotion or neutral?
n  Statistical properties of word freq. and length
n  Word freq. (word rank)-1 (Zipf 1949)
n  Word freq. predicts word length as a result of a principle
of least effort
n  Word length increases with information content for
efficient communication (Piantadosi et al. 2011).
Introduction (cont.)
n  Pollyanna hypothesis (Boucher & Osgood 1969)
A universal human tendency to use evaluatively positive
words more frequently and than evaluatively negative words
in communicating.
n  Previous researches reported emotional bias but
with the lack of control
n  Problems in the use of Amazon Mechanical Turk
n  Possible biases
n  Acquiescent bias
n  Social desirability bias
n  Framing effects
Data Analysis
n  This paper examined emotional bias in three major
languages on the Internet.
n  English (56.6%), German (6.5%), Spanish (4.6%)
n  Data
n  Established lexica of affective word usage:
English:1,034, German: 2,902, Spanish: 1,034
n  Google N-gram dataset: 1012 tokens
n  Valence (v)
The degree of pleasure induced by the affective word
usage, rescaled between -1 and 1.
Data Science 2012, 1:3
atascience.com/content/1/1/3
Figure 1 Emotion word clouds with frequencies calculated from Google’s crawl. In each word cloud for
English (left), German (middle), and Spanish (right), the size of a word is proportional to its frequency of
appearance in the trillion-token Google N-gram dataset [26]. Word colors are chosen from red (negative) to
green (positive) in the valence range from psychology studies [7–9]. For the three languages, positive words
predominate on the Internet.
Results: Frequency of emotional
words
v=-1: red, v=+1: green
Exception
n  Positive words predominate on the Internet.
English German Spanish
Results: Distribution of
emotional wordsa Science 2012, 1:3 Page 5 of 12
science.com/content/1/1/3
n  The median shifts significantly
towards positive values ( 0.3).
n  95% confidence intervals
(Wilcoxon tests):
n  English: 0.257 0.032
n  German: 0.167 0.017
n  Spanish: 0.287 0.035
n  Empirical evidence of positive
bias
No control Control
al. EPJ Data Science 2012, 1:3 Pag
ww.epjdatascience.com/content/1/1/3
Table 1 Correlations between word valence and information measurements.
English German Spanish
ρ(v,f) 0.222** 0.144** 0.236**
ρ(v,I) –0.368** –0.325** –0.402**
ρ(v,I′) –0.294** –0.222** –0.311**
ρ(v,I2) –0.332** –0.301** –0.359**
ρ(v,I3) –0.313** –0.201** –0.340**
ρ(v,I4) –0.254** –0.049* –0.162**
Correlation coefficients of the valence (v), frequency f, self-information I, and information content measured for 2-grams I2, 3-
grams I3, and 4-grams I4, and with self-information I′ measured from the frequencies reported in [42–44]. Significance levels:
*p < 0.01, **p < 0.001.
ones, but this nonlinear mapping between frequency and self-information makes the latter
more closely related to word valence than the former. The first two lines of Table  show
the Pearson’s correlation coefficient of word valence and frequency ρ(v,f ), followed by the
correlation coefficient between word valence and self-information, ρ(v,I). For all three
languages, the absolute value of the correlation coefficient with I is larger than with f ,
showing that self-information provides more knowledge about word valence than plain
Results: Relation between
information and valence (1)
n  Information content is measured by self-information I(w),
which provides more knowledge on the valence than the frequency.
n  Negative correlation between v and I:
n  Positive words carry less information than negative words.
n  Correlation coefficient becomes smaller for the larger context
(N).
I(w) = −log2 P(w)
←Control analysis
Data Science 2012, 1:3 Page 7 of 12
jdatascience.com/content/1/1/3
Results: Relation between
information and valence (2)
−
1
N
log2
i=1
N
∑ P(W = w |C = ci )( )
n  For all languages and
context sizes, valence
decreases with information
content.
(Left)
Color: Valence (v)
Size: Self-information (I)
(Right)
Average self-information:
Results: Additional analysis of
valence, length, and self-info (1)
n  The sign of valence matters
n  Word length I(w)
n  Valence ? (word length)-1
n  Combined influence of valence and length to I(w)
n  Additional dimension in the communication process related to
emotional content (v) rather than communication efficiency (l)
valence, which means, indeed, that the usage frequency of a word is not just related to the
overall emotional intensity, but to the positive or negative emotion expressed by the word.
Subsequently, we found that the correlation coefficient between word length and self-
information (ρ(l,I)) is positive, showing that word length increases with self-information.
These values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
Table 2 Additional correlations between valence, self-information and length.
English German Spanish
ρ(abs(v),I) 0.032
◦
0.109*** 0.135***
ρ(l,I) 0.378*** 0.143*** 0.361***
ρ(v,l) –0.044
◦
–0.071*** –0.112***
ρ(v,I|l) –0.379*** –0.319*** –0.399***
ρ(l,I|v) 0.389*** 0.126*** 0.357***
Correlation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
(I). Partial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)).
Significance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
but this trend is not so clear for German. These trends are properly quan-
rson’s correlation coefficients between valence and information content for
size (Table ). Each correlation coefficient becomes smaller for larger sizes of
as the information content estimation includes a larger context but becomes
nal analysis of valence, length and self-information
rovide additional support for our results, we tested different hypotheses im-
elation between word usage and valence. First, we calculated Pearson’s and
orrelation coefficients between the absolute value of the valence and the self-
of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients
. for German and Spanish, while they are not significant for English. The
between valence and self-information disappears if we ignore the sign of the
h means, indeed, that the usage frequency of a word is not just related to the
onal intensity, but to the positive or negative emotion expressed by the word.
tly, we found that the correlation coefficient between word length and self-
ρ(l,I)) is positive, showing that word length increases with self-information.
of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
ional correlations between valence, self-information and length.
English German Spanish
0.032
◦
0.109*** 0.135***
0.378*** 0.143*** 0.361***
–0.044
◦
–0.071*** –0.112***
–0.379*** –0.319*** –0.399***
0.389*** 0.126*** 0.357***
ients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
Additional analysis of valence, length and self-information
rder to provide additional support for our results, we tested different hypotheses im-
ting the relation between word usage and valence. First, we calculated Pearson’s and
arman’s correlation coefficients between the absolute value of the valence and the self-
rmation of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients
e around . for German and Spanish, while they are not significant for English. The
endence between valence and self-information disappears if we ignore the sign of the
nce, which means, indeed, that the usage frequency of a word is not just related to the
rall emotional intensity, but to the positive or negative emotion expressed by the word.
ubsequently, we found that the correlation coefficient between word length and self-
rmation (ρ(l,I)) is positive, showing that word length increases with self-information.
se values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s
e 2 Additional correlations between valence, self-information and length.
English German Spanish
s(v),I) 0.032
◦
0.109*** 0.135***
0.378*** 0.143*** 0.361***
) –0.044
◦
–0.071*** –0.112***
|l) –0.379*** –0.319*** –0.399***
v) 0.389*** 0.126*** 0.357***
lation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information
rtial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)).
ficance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
Page 9 of 12
ween valence and information content.
German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
–0.020* –0.084**
n content measured on different context sizes (I2, I3, I4) controlling
1, **p < 0.01, ***p < 0.001.
and length ρ(v,l) are very low or not significant.
f valence and length to self-information, we cal-
s ρ(v,I|l) and ρ(l,I|v). The results are shown in
e intervals of the original correlation coefficients
or the existence of an additional dimension in the
o emotional content rather than communication
wn result that word lengths adapt to information
dent semantic feature of valence. Valence is also
the symbolic representation of the word through
context by controlling for word frequency. In Ta-
fficients of valence with information content for
g for self-information. We find that most of the
ve sign, with the exception of I for English. The
probably related to two word constructions such
ents between valence and information content.
h German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
* –0.020* –0.084**
information content measured on different context sizes (I2, I3, I4) controlling
< 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
valence and length ρ(v,l) are very low or not significant.
fluence of valence and length to self-information, we cal-
oefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in
onfidence intervals of the original correlation coefficients
upport for the existence of an additional dimension in the
elated to emotional content rather than communication
the known result that word lengths adapt to information
independent semantic feature of valence. Valence is also
ut not to the symbolic representation of the word through
uence of context by controlling for word frequency. In Ta-
tion coefficients of valence with information content for
ontrolling for self-information. We find that most of the
of negative sign, with the exception of I for English. The
es of  is probably related to two word constructions such
n valence and information content.
German Spanish
–0.100*** –0.058*
–0.070*** –0.149***
–0.020* –0.084**
tent measured on different context sizes (I2, I3, I4) controlling
p < 0.01, ***p < 0.001.
length ρ(v,l) are very low or not significant.
lence and length to self-information, we cal-
(v,I|l) and ρ(l,I|v). The results are shown in
tervals of the original correlation coefficients
he existence of an additional dimension in the
motional content rather than communication
result that word lengths adapt to information
t semantic feature of valence. Valence is also
symbolic representation of the word through
text by controlling for word frequency. In Ta-
ents of valence with information content for
or self-information. We find that most of the
sign, with the exception of I for English. The
Results: Additional analysis of
valence, length, and self-info (2)et al. EPJ Data Science 2012, 1:3 Pa
www.epjdatascience.com/content/1/1/3
Table 3 Partial correlation coefficients between valence and information content.
English German Spanish
ρ(v,I2|I) –0.034
◦
–0.100*** –0.058*
ρ(v,I3|I) –0.101** –0.070*** –0.149***
ρ(v,I4|I) –0.134*** –0.020* –0.084**
Correlation coefficients of the valence (v) and information content measured on different context sizes (I2, I3, I4) controlling
for self-information (I). Significance levels:
◦
p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001.
correlation coefficients between valence and length ρ(v,l) are very low or not significant.
In order to test the combined influence of valence and length to self-information, we cal-
culated the partial correlation coefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in
Table , and are within the % confidence intervals of the original correlation coefficients
ρ(v,I) and ρ(l,I). This provides support for the existence of an additional dimension in the
communication process closely related to emotional content rather than communication
efficiency. This is consistent with the known result that word lengths adapt to information
n  Most of the correlations keep significant and negative sign,
except I2 for English.
n  Knowing the possible contexts of a word (N=2,3,4) provides
further information about word valence than sole self-
information.
Summary
n  Empirical evidence for a positive bias in language
n  Positive words are more frequently used.
n  Pollyanna hypothesis
n  Facilitation of social links
n  Negative words convey more information content than
positive words.
n  Word frequency is determined by
n  Not only word length and information content
n  But also emotional content

Mais conteúdo relacionado

Destaque

Performance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songPerformance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songTokyo Tech
 
Decksetがよかった話
Decksetがよかった話Decksetがよかった話
Decksetがよかった話Kohki Miki
 
鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザインTokyo Tech
 
How sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songHow sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songTokyo Tech
 
スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk whywaita
 
Marp使ってみた
Marp使ってみたMarp使ってみた
Marp使ってみたclash m45
 
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Yutaka Kachi
 
ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用Yutaka Kachi
 
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステムJunichi Noda
 
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せWordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せSeto Takahiro
 

Destaque (16)

Performance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult songPerformance variability enables adaptive plasticity ‘crystallized’ adult song
Performance variability enables adaptive plasticity ‘crystallized’ adult song
 
Beamertemplete
BeamertempleteBeamertemplete
Beamertemplete
 
Decksetがよかった話
Decksetがよかった話Decksetがよかった話
Decksetがよかった話
 
鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン鳥の複雑なさえずりと進化的デザイン
鳥の複雑なさえずりと進化的デザイン
 
Ets induction
Ets inductionEts induction
Ets induction
 
Erlangen Beamer
Erlangen BeamerErlangen Beamer
Erlangen Beamer
 
Gadget2
Gadget2Gadget2
Gadget2
 
How sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird songHow sleep affects the developmental learning of bird song
How sleep affects the developmental learning of bird song
 
スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk スライド、作ってみませんか? #osc16tk
スライド、作ってみませんか? #osc16tk
 
Marp colors
Marp colorsMarp colors
Marp colors
 
Marp Tips
Marp TipsMarp Tips
Marp Tips
 
Marp使ってみた
Marp使ってみたMarp使ってみた
Marp使ってみた
 
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた: Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
Soramame.Block 100行のJavaScriptで ビジュアルプログラミング言語(のフロントエンド)を作ってみた:
 
ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用ギークスマホ「Fx0」入手と運用
ギークスマホ「Fx0」入手と運用
 
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
機械学習ライブラリ「Spark MLlib」で作る アニメレコメンドシステム
 
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せWordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
WordPressプラグイン開発の めんどうな作業は執事(Jenkins)にお任せ
 

Último

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Último (20)

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

Positive Words Carry Less Information Than Negative Words

  • 1. Positive words carry less information than negative words D. Garcia, A. Garas and F. Schweitzer EPJ Data Science, 2012 JClub 2014.4.23 by Kazutoshi Sasahara
  • 2. Introduction n  Is human language biased towards positive emotion or neutral? n  Statistical properties of word freq. and length n  Word freq. (word rank)-1 (Zipf 1949) n  Word freq. predicts word length as a result of a principle of least effort n  Word length increases with information content for efficient communication (Piantadosi et al. 2011).
  • 3. Introduction (cont.) n  Pollyanna hypothesis (Boucher & Osgood 1969) A universal human tendency to use evaluatively positive words more frequently and than evaluatively negative words in communicating. n  Previous researches reported emotional bias but with the lack of control n  Problems in the use of Amazon Mechanical Turk n  Possible biases n  Acquiescent bias n  Social desirability bias n  Framing effects
  • 4. Data Analysis n  This paper examined emotional bias in three major languages on the Internet. n  English (56.6%), German (6.5%), Spanish (4.6%) n  Data n  Established lexica of affective word usage: English:1,034, German: 2,902, Spanish: 1,034 n  Google N-gram dataset: 1012 tokens n  Valence (v) The degree of pleasure induced by the affective word usage, rescaled between -1 and 1.
  • 5. Data Science 2012, 1:3 atascience.com/content/1/1/3 Figure 1 Emotion word clouds with frequencies calculated from Google’s crawl. In each word cloud for English (left), German (middle), and Spanish (right), the size of a word is proportional to its frequency of appearance in the trillion-token Google N-gram dataset [26]. Word colors are chosen from red (negative) to green (positive) in the valence range from psychology studies [7–9]. For the three languages, positive words predominate on the Internet. Results: Frequency of emotional words v=-1: red, v=+1: green Exception n  Positive words predominate on the Internet. English German Spanish
  • 6. Results: Distribution of emotional wordsa Science 2012, 1:3 Page 5 of 12 science.com/content/1/1/3 n  The median shifts significantly towards positive values ( 0.3). n  95% confidence intervals (Wilcoxon tests): n  English: 0.257 0.032 n  German: 0.167 0.017 n  Spanish: 0.287 0.035 n  Empirical evidence of positive bias No control Control
  • 7. al. EPJ Data Science 2012, 1:3 Pag ww.epjdatascience.com/content/1/1/3 Table 1 Correlations between word valence and information measurements. English German Spanish ρ(v,f) 0.222** 0.144** 0.236** ρ(v,I) –0.368** –0.325** –0.402** ρ(v,I′) –0.294** –0.222** –0.311** ρ(v,I2) –0.332** –0.301** –0.359** ρ(v,I3) –0.313** –0.201** –0.340** ρ(v,I4) –0.254** –0.049* –0.162** Correlation coefficients of the valence (v), frequency f, self-information I, and information content measured for 2-grams I2, 3- grams I3, and 4-grams I4, and with self-information I′ measured from the frequencies reported in [42–44]. Significance levels: *p < 0.01, **p < 0.001. ones, but this nonlinear mapping between frequency and self-information makes the latter more closely related to word valence than the former. The first two lines of Table  show the Pearson’s correlation coefficient of word valence and frequency ρ(v,f ), followed by the correlation coefficient between word valence and self-information, ρ(v,I). For all three languages, the absolute value of the correlation coefficient with I is larger than with f , showing that self-information provides more knowledge about word valence than plain Results: Relation between information and valence (1) n  Information content is measured by self-information I(w), which provides more knowledge on the valence than the frequency. n  Negative correlation between v and I: n  Positive words carry less information than negative words. n  Correlation coefficient becomes smaller for the larger context (N). I(w) = −log2 P(w) ←Control analysis
  • 8. Data Science 2012, 1:3 Page 7 of 12 jdatascience.com/content/1/1/3 Results: Relation between information and valence (2) − 1 N log2 i=1 N ∑ P(W = w |C = ci )( ) n  For all languages and context sizes, valence decreases with information content. (Left) Color: Valence (v) Size: Self-information (I) (Right) Average self-information:
  • 9. Results: Additional analysis of valence, length, and self-info (1) n  The sign of valence matters n  Word length I(w) n  Valence ? (word length)-1 n  Combined influence of valence and length to I(w) n  Additional dimension in the communication process related to emotional content (v) rather than communication efficiency (l) valence, which means, indeed, that the usage frequency of a word is not just related to the overall emotional intensity, but to the positive or negative emotion expressed by the word. Subsequently, we found that the correlation coefficient between word length and self- information (ρ(l,I)) is positive, showing that word length increases with self-information. These values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s Table 2 Additional correlations between valence, self-information and length. English German Spanish ρ(abs(v),I) 0.032 ◦ 0.109*** 0.135*** ρ(l,I) 0.378*** 0.143*** 0.361*** ρ(v,l) –0.044 ◦ –0.071*** –0.112*** ρ(v,I|l) –0.379*** –0.319*** –0.399*** ρ(l,I|v) 0.389*** 0.126*** 0.357*** Correlation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information (I). Partial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)). Significance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. but this trend is not so clear for German. These trends are properly quan- rson’s correlation coefficients between valence and information content for size (Table ). Each correlation coefficient becomes smaller for larger sizes of as the information content estimation includes a larger context but becomes nal analysis of valence, length and self-information rovide additional support for our results, we tested different hypotheses im- elation between word usage and valence. First, we calculated Pearson’s and orrelation coefficients between the absolute value of the valence and the self- of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients . for German and Spanish, while they are not significant for English. The between valence and self-information disappears if we ignore the sign of the h means, indeed, that the usage frequency of a word is not just related to the onal intensity, but to the positive or negative emotion expressed by the word. tly, we found that the correlation coefficient between word length and self- ρ(l,I)) is positive, showing that word length increases with self-information. of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s ional correlations between valence, self-information and length. English German Spanish 0.032 ◦ 0.109*** 0.135*** 0.378*** 0.143*** 0.361*** –0.044 ◦ –0.071*** –0.112*** –0.379*** –0.319*** –0.399*** 0.389*** 0.126*** 0.357*** ients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information Additional analysis of valence, length and self-information rder to provide additional support for our results, we tested different hypotheses im- ting the relation between word usage and valence. First, we calculated Pearson’s and arman’s correlation coefficients between the absolute value of the valence and the self- rmation of a word, ρ(abs(v),I) (see Table ). We found both correlation coefficients e around . for German and Spanish, while they are not significant for English. The endence between valence and self-information disappears if we ignore the sign of the nce, which means, indeed, that the usage frequency of a word is not just related to the rall emotional intensity, but to the positive or negative emotion expressed by the word. ubsequently, we found that the correlation coefficient between word length and self- rmation (ρ(l,I)) is positive, showing that word length increases with self-information. se values of ρ(l,I) are consistent with previous results [, ]. Pearson’s and Spearman’s e 2 Additional correlations between valence, self-information and length. English German Spanish s(v),I) 0.032 ◦ 0.109*** 0.135*** 0.378*** 0.143*** 0.361*** ) –0.044 ◦ –0.071*** –0.112*** |l) –0.379*** –0.319*** –0.399*** v) 0.389*** 0.126*** 0.357*** lation coefficients of the valence (v), absolute value of the valence (abs(v)), and word length (l) versus self-information rtial correlations are calculated for both variables (ρ(v,I|l),ρ(l,I|v)), and correlation between valence and length (ρ(v,l)). ficance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. Page 9 of 12 ween valence and information content. German Spanish –0.100*** –0.058* –0.070*** –0.149*** –0.020* –0.084** n content measured on different context sizes (I2, I3, I4) controlling 1, **p < 0.01, ***p < 0.001. and length ρ(v,l) are very low or not significant. f valence and length to self-information, we cal- s ρ(v,I|l) and ρ(l,I|v). The results are shown in e intervals of the original correlation coefficients or the existence of an additional dimension in the o emotional content rather than communication wn result that word lengths adapt to information dent semantic feature of valence. Valence is also the symbolic representation of the word through context by controlling for word frequency. In Ta- fficients of valence with information content for g for self-information. We find that most of the ve sign, with the exception of I for English. The probably related to two word constructions such ents between valence and information content. h German Spanish –0.100*** –0.058* –0.070*** –0.149*** * –0.020* –0.084** information content measured on different context sizes (I2, I3, I4) controlling < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. valence and length ρ(v,l) are very low or not significant. fluence of valence and length to self-information, we cal- oefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in onfidence intervals of the original correlation coefficients upport for the existence of an additional dimension in the elated to emotional content rather than communication the known result that word lengths adapt to information independent semantic feature of valence. Valence is also ut not to the symbolic representation of the word through uence of context by controlling for word frequency. In Ta- tion coefficients of valence with information content for ontrolling for self-information. We find that most of the of negative sign, with the exception of I for English. The es of  is probably related to two word constructions such n valence and information content. German Spanish –0.100*** –0.058* –0.070*** –0.149*** –0.020* –0.084** tent measured on different context sizes (I2, I3, I4) controlling p < 0.01, ***p < 0.001. length ρ(v,l) are very low or not significant. lence and length to self-information, we cal- (v,I|l) and ρ(l,I|v). The results are shown in tervals of the original correlation coefficients he existence of an additional dimension in the motional content rather than communication result that word lengths adapt to information t semantic feature of valence. Valence is also symbolic representation of the word through text by controlling for word frequency. In Ta- ents of valence with information content for or self-information. We find that most of the sign, with the exception of I for English. The
  • 10. Results: Additional analysis of valence, length, and self-info (2)et al. EPJ Data Science 2012, 1:3 Pa www.epjdatascience.com/content/1/1/3 Table 3 Partial correlation coefficients between valence and information content. English German Spanish ρ(v,I2|I) –0.034 ◦ –0.100*** –0.058* ρ(v,I3|I) –0.101** –0.070*** –0.149*** ρ(v,I4|I) –0.134*** –0.020* –0.084** Correlation coefficients of the valence (v) and information content measured on different context sizes (I2, I3, I4) controlling for self-information (I). Significance levels: ◦ p < 0.3, *p < 0.1, **p < 0.01, ***p < 0.001. correlation coefficients between valence and length ρ(v,l) are very low or not significant. In order to test the combined influence of valence and length to self-information, we cal- culated the partial correlation coefficients ρ(v,I|l) and ρ(l,I|v). The results are shown in Table , and are within the % confidence intervals of the original correlation coefficients ρ(v,I) and ρ(l,I). This provides support for the existence of an additional dimension in the communication process closely related to emotional content rather than communication efficiency. This is consistent with the known result that word lengths adapt to information n  Most of the correlations keep significant and negative sign, except I2 for English. n  Knowing the possible contexts of a word (N=2,3,4) provides further information about word valence than sole self- information.
  • 11. Summary n  Empirical evidence for a positive bias in language n  Positive words are more frequently used. n  Pollyanna hypothesis n  Facilitation of social links n  Negative words convey more information content than positive words. n  Word frequency is determined by n  Not only word length and information content n  But also emotional content