We present a two-step topic modeling method of analysing political articulations in everyday proto-political "civic talk" on online social media and interpreting them in terms of cultural and political sociology.
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engagements
1. Using Topic Modeling to
Study Everyday “Civic Talk”
and Proto-political
Engagements
Veikko Eranti &TuukkaYlä-Anttila
Universities of Helsinki &Tampere
“Citizens in the Making” (Kone Foundation 2015–2017)
blogs.uta.fi/cim
@VeikkoEranti @TuukkaYlaAnt
2. Background
• A larger project combining
ethnographic and digital
methods
• Citizenship as action, as
process – “grown into”
• Our subfield: online proto-
politics and politics
• How does everyday “civic
talk” get articulated and
raised onto the level of
political discourse,
participation?
• Proof-of-concept empirical
analysis of a discussion
forum dataset
3. Materials
• Project: several social media datasets
• Here: Suomi24 (Finland24) forum
• Subset of 2.5M words (whole 2001–2015 dataset: 2.5B
words)
• A general interest forum, largest of its kind in Finland
• Sub-forums: local municipalities, cars, hobbies, home
& DIY, pets, travel, Jesus, sex, and Jesus & sex
• Dedicated sections for political discussion, but it also
“leaks” to other discussion areas
• We look at proto-political talk on the forum as a whole
4. Theory
• Online political talk not an ideal Habermasian
speech situation or public sphere
• Not necessarily political arguments: grievances,
expressions of resentment... below the threshold
of argumentation and deliberation (Mouffe,Young,
Habermas, Laclau, Dahlgren,Thévenot, Klofstadt)
• Working hypothesis: articulation of grievances
(and bigger idea of civic culture) reflected in pre-
/proto-political discussions
5. Methods
• Topic modeling: unsupervised machine learning
• Takes text, gives you “topics”: sets of words that
occur together in documents
• Can frames, discourses, justifications etc. objects
of cultural sociology be operationalized as such
topics? (DiMaggio, Nag & Blei 2013)
• We run a 50-topic LDA model with MALLET to find
(proto)political talk in everyday debates
• 50 sets of words which often occur together: topics
of discussion
6. Examples of topics
(top 10 words)
topic17: new need Finland through produce change
problem build small action future use nowadays
opportunity option
topic23: Finland Sweden language church Finnish
Swedish speak school country learn Catholic religion
belong study Islam
topic32: Finland pay Euro money tax billion state million
poor cut government economy rich count large
7. Interpreting topics
• These were political words, but don’t really represent a
political articulation (a position, a justification or even
a policy theme)
• We interpret 9 of 50 topics as political or proto-political
• How to get closer to political articulations from this
general “civic talk”?
• Let’s pick “proto-political” topics from the 50 and reduce the
dataset to the 100 most important messages from each
• Reduced to 827 messages (from ~42 000)
• 30-topic LDA model on them
8. But first… an aside on
VALIDATION of interpretations
• This is a proof-of-concept, so we validated these
very superficially
• In actual work…VALIDATE,VALIDATE,VALIDATE!
• Context-specific deep knowledge of your data –
read it!
• Internal validation, external validation (Evans 2014,
Grimmer & Stewart 2013)
• ICCSS2015 poster: more systematic validation
9. Examples of topics in
“submodel”
topic3: Marx work workingclass capitalism teacher
socialism worker create pay workingtime value long
wellbeing production product
topic12: Finland Niinistö parliament Soini president
TrueFinn party Halla-aho choose minister leader
chairman foreignminister memberofparliament Russia
topic22: member association function union expel
organization name right important only Halonen
membershipfee forum join DDR
10. 21 of 30 topics are rather clear political
articulations! Example:
11. Conclusions
• 50-topic model of a general interest forum: no or vaguely
political articulations
• However, “proto-political” discussions as reduced dataset
produces much more coherent articulations
• Locating proto-political talk in big data and then, further,
pinpointing political articulations arising from that
• Drawing a map of big datasets for further qualitative
exploration
• Sub-model topics are still largely thematic instead of
practices, frames, justifications etc.
• Can we get at these through vocabulary?
• Note: this demo was 1/1000 of the entire Suomi24 dataset
• Importance of theory and conceptual work
12. Extra idea
Could we model 1) fringe forums, 2) “mid-level” forums and
3) general forums/media to plot the emergence, spreading
and mainstreaming of articulations?
13. References
• Dahlgren, Peter. 2000. “The Internet and the Democratization of Civic Culture.”
Political Communication 17: 335–40.
• DiMaggio, Paul, Manish Nag, and David M. Blei. 2013. “ExploitingAffinities
betweenTopic Modeling and the Sociological Perspective on Culture:
Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics
41(6): 570–606.
• Evans, Michael S. 2014. “A Computational Approach to Qualitative Analysis in
LargeTextual Datasets.” PLoS ONE 9(2): 1–10.
• Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data:The Promise and
Pitfalls of Automatic Content Analysis Methods for PoliticalTexts.” Political
Analysis 21(3): 267–97.
• Klofstad, Casey A. 2011. CivicTalk: Peers, Politics, and the Future of Democracy.
Temple University Press.
• Thévenot, Laurent. 2014. “VoicingConcern and Difference: From Public Spaces
to Common-Places.” EuropeanJournal of Cultural and Political Sociology 1(1): 7–
34.
• Etc.
Notas do Editor
TUUKKA (& VEIKKO)
VEIKKO
TUUKKA
VEIKKO
We are theoretically very open in defining “civic talk” / “proto-politics” etc.