SlideShare uma empresa Scribd logo
1 de 12
CONCEPTS
THROUGH TIME
Tracing Concepts in Dutch Newspaper Discourse
using Sequential Word Vector Spaces
Translantis Project
Digital Humanities Approaches to Reference
Cultures: The Emergence of the United States
in Dutch Public Discourse 1890-1990
Melvin Wevers, Tom Kenter & Pim Huijnen
Utrecht University & University of Amsterdam, the Netherlands
PROBLEM =
CHALLENGE
• Conceptual history / intellectual history studies the emergence and
transformation of concepts, ideas, and thoughts.
• Problems with existing methods
• Use of predefined list of words (N-gram viewers / Full-text search)
• Top-down approaches (NER, word classification lists) make use pre-
established models that are often a-historic
• Topic modeling is useful but quite static
• How to to trace the genealogy of a concept?
CONCEPTS THROUGH
TIME
• We would like to study changes in
the meaning (constitution) of
concepts over time
• Question: What words were used in
the past to talk about particular
concepts?
OUR APPROACH
• Multi-dimensional word-vector
space using Google’s
word2vec (neural language
model)
• Data: 500.000 digitized
newspaper issues from the
Dutch National Library
• Semantic and syntactic
information representation by
geometry (Baroni &
Kruszweksi, 2014; Wijaya &
Yeniterzi, 2011)
1950 1960 1970
1 model = 10 years
40 models for period
between 1950-1990
TRACING CONCEPTS
• One or more words as entry-
points into concept
• Concepts defined by in and out
links > inspired by Deleuze’s
notion of the rhizome
• Model ambiguity see which
words remain and disappear
from network
• Fast and relatively light
• Forwards and backwards
RAW OUTPUT
>>> tc.trackWord(dModels, 'buitenlanders')
1950_1959: vreemdelingen (0.76), nederlanders (0.69), indonesiërs (0.65), toeristen (0.62), europeanen (0.61), vacantiegangers (0.58), mensen (0.57), vakantiegangers (0.56),
duitsers (0.54), dagjesmensen (0.54)
1951_1960: vreemdelingen (0.76), nederlanders (0.74), toeristen (0.64), indonesiërs (0.64), europeanen (0.64), bezoekers (0.59), immigranten (0.58), duitsers (0.57), mensen
(0.57), kampeerders (0.57)
1952_1961: vreemdelingen (0.74), toeristen (0.69), nederlanders (0.68), indonesiërs (0.61), dagjesmensen (0.61), bezoekers (0.61), kampeerders (0.60), europeanen (0.59),
vakantiegangers (0.59), duitsers (0.57)
1953_1962: vreemdelingen (0.74), toeristen (0.70), bezoekers (0.64), nederlanders (0.63), vacantiegangers (0.62), kampeerders (0.59), vakantiegangers (0.59), dagjesmensen
(0.57), mensen (0.57), automobilisten (0.55)
1954_1963: toeristen (0.69), vreemdelingen (0.68), nederlanders (0.66), bezoekers (0.62), vakantiegangers (0.60), kampeerders (0.59), vacantiegangers (0.58), immigranten
(0.56), jongelui (0.55), jongeren (0.55)
1955_1964: toeristen (0.70), vreemdelingen (0.70), nederlanders (0.64), vakantiegangers (0.64), bezoekers (0.63), kampeerders (0.63), vacantiegangers (0.59), mensen (0.59),
dagjesmensen (0.56), jongelui (0.55)
1956_1965: vreemdelingen (0.71), toeristen (0.70), vakantiegangers (0.64), kampeerders (0.63), nederlanders (0.62), bezoekers (0.62), mensen (0.61), duitsers (0.57),
vacantiegangers (0.56), gezinnen (0.56)
1957_1966: vreemdelingen (0.68), toeristen (0.68), nederlanders (0.63), kampeerders (0.62), vakantiegangers (0.60), mensen (0.59), bezoekers (0.58), duitsers (0.57),
sportvissers (0.56), vacantiegangers (0.55)
1958_1967: toeristen (0.71), vreemdelingen (0.71), nederlanders (0.68), vakantiegangers (0.64), kampeerders (0.63), bezoekers (0.60), marokkanen (0.59), duitsers (0.58),
dagjesmensen (0.58), mensen (0.57)
1959_1968: toeristen (0.69), nederlanders (0.68), vreemdelingen (0.66), kampeerders (0.62), bezoekers (0.61), vacantiegangers (0.61), vakantiegangers (0.58), sportvissers
(0.58), hotelgasten (0.57), mensen (0.57)
1960_1969: toeristen (0.72), vreemdelingen (0.70), nederlanders (0.68), kampeerders (0.61), vakantiegangers (0.61), zakenmensen (0.59), marokkanen (0.59), mensen (0.59),
zakenlieden (0.58), bezoekers (0.58)
1961_1970: vreemdelingen (0.71), toeristen (0.68), nederlanders (0.65), kampeerders (0.63), vakantiegangers (0.62), reizigers (0.61), marokkanen (0.59), bezoekers (0.59),
vacantiegangers (0.59), mensen (0.58)
1962_1971: vreemdelingen (0.71), nederlanders (0.68), toeristen (0.67), kampeerders (0.63), indonesiërs (0.59), vakantiegangers (0.59), dagjesmensen (0.59), marokkanen
(0.58), sportvissers (0.57), vakantiegasten (0.57)
1963_1972: vreemdelingen (0.72), nederlanders (0.71), toeristen (0.68), indonesiërs (0.62), kampeerders (0.61), mensen (0.58), gezinnen (0.58), scandinaviërs (0.58), turken
(0.57), duitsers (0.57)
1964_1973: nederlanders (0.69), vreemdelingen (0.67), toeristen (0.66), surinamers (0.62), indonesiërs (0.62), marokkanen (0.61), sportvissers (0.60), turken (0.59), mensen
(0.58), antillianen (0.57)
1965_1974: nederlanders (0.73), vreemdelingen (0.71), toeristen (0.64), marokkanen (0.62), turken (0.60), kampeerders (0.59), indonesiërs (0.59), surinamers (0.59),
spanjaarden (0.57), duitsers (0.56)
1966_1975: nederlanders (0.70), vreemdelingen (0.69), toeristen (0.68), indonesiërs (0.64), prostituées (0.61), marokkanen (0.60), gezinnen (0.59), mensen (0.59), surinamers
(0.58), kampeerders (0.58)
1967_1976: nederlanders (0.71), toeristen (0.65), indonesiërs (0.63), vreemdelingen (0.63), chilenen (0.57), surinamers (0.57), kampeerders (0.57), gezinnen (0.57), duitsers
(0.56), jongelui (0.55)
1968_1977: nederlanders (0.72), vreemdelingen (0.68), toeristen (0.64), vakantiegangers (0.62), kampeerders (0.62), indonesiërs (0.59), duitsers (0.59), loeristen (0.58), mensen
(0.58), tunesiërs (0.58)
1969_1978: nederlanders (0.73), vreemdelingen (0.72), toeristen (0.66), surinamers (0.63), indonesiërs (0.61), tunesiërs (0.59), guyanezen (0.58), gezinnen (0.58), chilenen
(0.58), vakantiegangers (0.58)
1970_1979: nederlanders (0.75), surinamers (0.65), vreemdelingen (0.64), toeristen (0.63), indonesiërs (0.62), guyanezen (0.60), vakantiegangers (0.60), gastarbeiders (0.59),
antillianen (0.59), chilenen (0.59)
1971_1980: nederlanders (0.71), surinamers (0.65), toeristen (0.64), vreemdelingen (0.63), vakantiegangers (0.61), chinezen (0.61), antillianen (0.58), guyanezen (0.57), mensen
(0.57), gezinnen (0.57)
1972_1981: nederlanders (0.72), surinamers (0.66), vreemdelingen (0.63), toeristen (0.60), gastarbeiders (0.60), chinezen (0.59), vietnamezen (0.59), indonesiërs (0.59), illegalen
(0.58), vakantiegangers (0.58)
1973_1982: surinamers (0.71), vreemdelingen (0.70), nederlanders (0.69), gastarbeiders (0.63), guyanezen (0.62), illegalen (0.61), indonesiërs (0.61), chinezen (0.60), zigeuners
(0.60), molukkers (0.59)
1974_1983: surinamers (0.70), vreemdelingen (0.69), gastarbeiders (0.69), nederlanders (0.67), antillianen (0.63), zigeuners (0.59), illegalen (0.58), immigranten (0.58), jongeren
(0.58), turken (0.57)
1975_1984: surinamers (0.59), gastarbeiders (0.58), vreemdelingen (0.57), turken (0.55), marokkanen (0.54), nederlanders (0.52), jongeren (0.51), antillianen (0.50), zigeuners
(0.50), illegalen (0.49)
1976_1985: gastarbeiders (0.57), surinamers (0.55), vreemdelingen (0.55), turken (0.53), migranten (0.52), turks (0.52), marokkanen (0.50), zigeuners (0.50), nederlanders (0.49),
jongeren (0.48)
1977_1986: surinamers (0.58), gastarbeiders (0.57), vreemdelingen (0.55), turken (0.53), nederlanders (0.53), migranten (0.52), marokkanen (0.50), antillianen (0.49),
visumplichtige (0.49), illegalen (0.48)
PROPAGANDA
TRACE CONCEPT
tc.trackClouds3(dModels,
['propaganda'], fMinDist=.6,
bSumOfDistances=True)
1950-1959: propaganda
1960-1969: advertising, commercial,
non-commercial, commercial messages
1970-1979: tv broadcasting,
advertising, propaganda, tv programs
1980-1989: sport broadcasting,
television broadcasting, advertising,
radio broadcastign
RELATED WORDS
tc.trackWord(dModels,
'propaganda', fMinDist=0.5)
1950-1959: agitation, campaign,
campaigns, infiltration, election
propaganda, advertising
1960-1969: agitation, campaign,
nuclear protest, nuclear arms protest,
anti, activities
1968-1977: campaign, agitation,
imperialistic, sovietism, soviet
campaign, soviet propaganda,
militaristic, strikes
ALIENS
TC.TRACKWORD(DMODELS, 'ALIENS')
1950-1959: aliens, foreigners, tourists, Indonesians,
Europeans, traveling worker
1960-1969: foreigners, tourists, holiday people, automobile
drivers, islanders, campers
1970-1979: foreigners, Surinamese, gypsies, Ambonesians,
Guyanese, delinquents, country men, minors, illegal aliens,
drug users
1980-1990: illegals, Surinamese, gypsies, asylum seekers,
immigrants, guest workers, trailer people, Antilles, tamils
CONCLUSIONS
• Trace concepts over large periods of
time
• Greater sensitivity to semantic
changes based on corpus
• Greater heuristic interactivity with the
researcher
FUTURE WORK
• Optimize algorithm based on
different types of conceptual
changes
• Query expansion. Use this
technique to find relevant related
words within specific periods
THANK YOU!
@melvinwevers //
melvinwevers@gmail.com
www.translantis.nl
(2009): 71.
Deleuze, Gilles. A Thousand Plateaus: Capitalism and Schizophrenia. University of Minnesota Press, 1987.
Huijnen, Pim, Fons Laan, Maarten de Rijke, and Toine Pieters. “A Digital Humanities Approach to the History of
Science.” In Social Informatics, edited by Akiyo Nadamoto, Adam Jatowt, Adam Wierzbicki, and Jochen L. Leidner, 71–
85. Lecture Notes in Computer Science 8359. Springer Berlin Heidelberg, 2014.
Kenter, Tom, Melvin Wevers, and Pim Huijnen “Ad Hoc Monitoring of Vocabulary Shifts over Time.” To be published
Kim, Yoon, Yi-I. Chiu, Kentaro Hanaki, Darshan Hegde, and Slav Petrov. “Temporal Analysis of Language through
Neural Language Models.” arXiv:1405.3515 [cs], May 14, 2014. http://arxiv.org/abs/1405.3515.
Klingenstein, S., T. Hitchcock, and S. DeDeo. “The Civilizing Process in London’s Old Bailey.” Proceedings of the
National Academy of Sciences 111, no. 26 (July 1, 2014): 9419–24.
Kruszewski, Marco Baroni Georgiana Dinu Germán. “Don’t Count, Predict! A Systematic Comparison of Context-
Counting vs. Context-Predicting Semantic Vectors.” Accessed September 11, 2014.
http://anthology.aclweb.org/P/P14/P14-1023.xhtml.
Wang, Xuerui, and Andrew McCallum. “Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends.”
In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 424–33.
ACM, 2006.
Wiedemann, Gregor, Andreas Niekler, and others. “Document Retrieval for Large Scale Content Analysis Using
Contextualized Dictionaries.” In Terminology and Knowledge Engineering 2014, 2014. http://hal.archives-ouvertes.fr/hal-
01005879/.
Wijaya, Derry Tanti, and Reyyan Yeniterzi. “Understanding Semantic Change of Words over Centuries.” In Proceedings
of the 2011 International Workshop on DETecting and Exploiting Cultural diversiTy on the Social Web, 35–40. ACM,

Mais conteúdo relacionado

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces

  • 1. CONCEPTS THROUGH TIME Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces Translantis Project Digital Humanities Approaches to Reference Cultures: The Emergence of the United States in Dutch Public Discourse 1890-1990 Melvin Wevers, Tom Kenter & Pim Huijnen Utrecht University & University of Amsterdam, the Netherlands
  • 2. PROBLEM = CHALLENGE • Conceptual history / intellectual history studies the emergence and transformation of concepts, ideas, and thoughts. • Problems with existing methods • Use of predefined list of words (N-gram viewers / Full-text search) • Top-down approaches (NER, word classification lists) make use pre- established models that are often a-historic • Topic modeling is useful but quite static • How to to trace the genealogy of a concept?
  • 3. CONCEPTS THROUGH TIME • We would like to study changes in the meaning (constitution) of concepts over time • Question: What words were used in the past to talk about particular concepts?
  • 4. OUR APPROACH • Multi-dimensional word-vector space using Google’s word2vec (neural language model) • Data: 500.000 digitized newspaper issues from the Dutch National Library • Semantic and syntactic information representation by geometry (Baroni & Kruszweksi, 2014; Wijaya & Yeniterzi, 2011) 1950 1960 1970 1 model = 10 years 40 models for period between 1950-1990
  • 5. TRACING CONCEPTS • One or more words as entry- points into concept • Concepts defined by in and out links > inspired by Deleuze’s notion of the rhizome • Model ambiguity see which words remain and disappear from network • Fast and relatively light • Forwards and backwards
  • 6. RAW OUTPUT >>> tc.trackWord(dModels, 'buitenlanders') 1950_1959: vreemdelingen (0.76), nederlanders (0.69), indonesiërs (0.65), toeristen (0.62), europeanen (0.61), vacantiegangers (0.58), mensen (0.57), vakantiegangers (0.56), duitsers (0.54), dagjesmensen (0.54) 1951_1960: vreemdelingen (0.76), nederlanders (0.74), toeristen (0.64), indonesiërs (0.64), europeanen (0.64), bezoekers (0.59), immigranten (0.58), duitsers (0.57), mensen (0.57), kampeerders (0.57) 1952_1961: vreemdelingen (0.74), toeristen (0.69), nederlanders (0.68), indonesiërs (0.61), dagjesmensen (0.61), bezoekers (0.61), kampeerders (0.60), europeanen (0.59), vakantiegangers (0.59), duitsers (0.57) 1953_1962: vreemdelingen (0.74), toeristen (0.70), bezoekers (0.64), nederlanders (0.63), vacantiegangers (0.62), kampeerders (0.59), vakantiegangers (0.59), dagjesmensen (0.57), mensen (0.57), automobilisten (0.55) 1954_1963: toeristen (0.69), vreemdelingen (0.68), nederlanders (0.66), bezoekers (0.62), vakantiegangers (0.60), kampeerders (0.59), vacantiegangers (0.58), immigranten (0.56), jongelui (0.55), jongeren (0.55) 1955_1964: toeristen (0.70), vreemdelingen (0.70), nederlanders (0.64), vakantiegangers (0.64), bezoekers (0.63), kampeerders (0.63), vacantiegangers (0.59), mensen (0.59), dagjesmensen (0.56), jongelui (0.55) 1956_1965: vreemdelingen (0.71), toeristen (0.70), vakantiegangers (0.64), kampeerders (0.63), nederlanders (0.62), bezoekers (0.62), mensen (0.61), duitsers (0.57), vacantiegangers (0.56), gezinnen (0.56) 1957_1966: vreemdelingen (0.68), toeristen (0.68), nederlanders (0.63), kampeerders (0.62), vakantiegangers (0.60), mensen (0.59), bezoekers (0.58), duitsers (0.57), sportvissers (0.56), vacantiegangers (0.55) 1958_1967: toeristen (0.71), vreemdelingen (0.71), nederlanders (0.68), vakantiegangers (0.64), kampeerders (0.63), bezoekers (0.60), marokkanen (0.59), duitsers (0.58), dagjesmensen (0.58), mensen (0.57) 1959_1968: toeristen (0.69), nederlanders (0.68), vreemdelingen (0.66), kampeerders (0.62), bezoekers (0.61), vacantiegangers (0.61), vakantiegangers (0.58), sportvissers (0.58), hotelgasten (0.57), mensen (0.57) 1960_1969: toeristen (0.72), vreemdelingen (0.70), nederlanders (0.68), kampeerders (0.61), vakantiegangers (0.61), zakenmensen (0.59), marokkanen (0.59), mensen (0.59), zakenlieden (0.58), bezoekers (0.58) 1961_1970: vreemdelingen (0.71), toeristen (0.68), nederlanders (0.65), kampeerders (0.63), vakantiegangers (0.62), reizigers (0.61), marokkanen (0.59), bezoekers (0.59), vacantiegangers (0.59), mensen (0.58) 1962_1971: vreemdelingen (0.71), nederlanders (0.68), toeristen (0.67), kampeerders (0.63), indonesiërs (0.59), vakantiegangers (0.59), dagjesmensen (0.59), marokkanen (0.58), sportvissers (0.57), vakantiegasten (0.57) 1963_1972: vreemdelingen (0.72), nederlanders (0.71), toeristen (0.68), indonesiërs (0.62), kampeerders (0.61), mensen (0.58), gezinnen (0.58), scandinaviërs (0.58), turken (0.57), duitsers (0.57) 1964_1973: nederlanders (0.69), vreemdelingen (0.67), toeristen (0.66), surinamers (0.62), indonesiërs (0.62), marokkanen (0.61), sportvissers (0.60), turken (0.59), mensen (0.58), antillianen (0.57) 1965_1974: nederlanders (0.73), vreemdelingen (0.71), toeristen (0.64), marokkanen (0.62), turken (0.60), kampeerders (0.59), indonesiërs (0.59), surinamers (0.59), spanjaarden (0.57), duitsers (0.56) 1966_1975: nederlanders (0.70), vreemdelingen (0.69), toeristen (0.68), indonesiërs (0.64), prostituées (0.61), marokkanen (0.60), gezinnen (0.59), mensen (0.59), surinamers (0.58), kampeerders (0.58) 1967_1976: nederlanders (0.71), toeristen (0.65), indonesiërs (0.63), vreemdelingen (0.63), chilenen (0.57), surinamers (0.57), kampeerders (0.57), gezinnen (0.57), duitsers (0.56), jongelui (0.55) 1968_1977: nederlanders (0.72), vreemdelingen (0.68), toeristen (0.64), vakantiegangers (0.62), kampeerders (0.62), indonesiërs (0.59), duitsers (0.59), loeristen (0.58), mensen (0.58), tunesiërs (0.58) 1969_1978: nederlanders (0.73), vreemdelingen (0.72), toeristen (0.66), surinamers (0.63), indonesiërs (0.61), tunesiërs (0.59), guyanezen (0.58), gezinnen (0.58), chilenen (0.58), vakantiegangers (0.58) 1970_1979: nederlanders (0.75), surinamers (0.65), vreemdelingen (0.64), toeristen (0.63), indonesiërs (0.62), guyanezen (0.60), vakantiegangers (0.60), gastarbeiders (0.59), antillianen (0.59), chilenen (0.59) 1971_1980: nederlanders (0.71), surinamers (0.65), toeristen (0.64), vreemdelingen (0.63), vakantiegangers (0.61), chinezen (0.61), antillianen (0.58), guyanezen (0.57), mensen (0.57), gezinnen (0.57) 1972_1981: nederlanders (0.72), surinamers (0.66), vreemdelingen (0.63), toeristen (0.60), gastarbeiders (0.60), chinezen (0.59), vietnamezen (0.59), indonesiërs (0.59), illegalen (0.58), vakantiegangers (0.58) 1973_1982: surinamers (0.71), vreemdelingen (0.70), nederlanders (0.69), gastarbeiders (0.63), guyanezen (0.62), illegalen (0.61), indonesiërs (0.61), chinezen (0.60), zigeuners (0.60), molukkers (0.59) 1974_1983: surinamers (0.70), vreemdelingen (0.69), gastarbeiders (0.69), nederlanders (0.67), antillianen (0.63), zigeuners (0.59), illegalen (0.58), immigranten (0.58), jongeren (0.58), turken (0.57) 1975_1984: surinamers (0.59), gastarbeiders (0.58), vreemdelingen (0.57), turken (0.55), marokkanen (0.54), nederlanders (0.52), jongeren (0.51), antillianen (0.50), zigeuners (0.50), illegalen (0.49) 1976_1985: gastarbeiders (0.57), surinamers (0.55), vreemdelingen (0.55), turken (0.53), migranten (0.52), turks (0.52), marokkanen (0.50), zigeuners (0.50), nederlanders (0.49), jongeren (0.48) 1977_1986: surinamers (0.58), gastarbeiders (0.57), vreemdelingen (0.55), turken (0.53), nederlanders (0.53), migranten (0.52), marokkanen (0.50), antillianen (0.49), visumplichtige (0.49), illegalen (0.48)
  • 7. PROPAGANDA TRACE CONCEPT tc.trackClouds3(dModels, ['propaganda'], fMinDist=.6, bSumOfDistances=True) 1950-1959: propaganda 1960-1969: advertising, commercial, non-commercial, commercial messages 1970-1979: tv broadcasting, advertising, propaganda, tv programs 1980-1989: sport broadcasting, television broadcasting, advertising, radio broadcastign RELATED WORDS tc.trackWord(dModels, 'propaganda', fMinDist=0.5) 1950-1959: agitation, campaign, campaigns, infiltration, election propaganda, advertising 1960-1969: agitation, campaign, nuclear protest, nuclear arms protest, anti, activities 1968-1977: campaign, agitation, imperialistic, sovietism, soviet campaign, soviet propaganda, militaristic, strikes
  • 8. ALIENS TC.TRACKWORD(DMODELS, 'ALIENS') 1950-1959: aliens, foreigners, tourists, Indonesians, Europeans, traveling worker 1960-1969: foreigners, tourists, holiday people, automobile drivers, islanders, campers 1970-1979: foreigners, Surinamese, gypsies, Ambonesians, Guyanese, delinquents, country men, minors, illegal aliens, drug users 1980-1990: illegals, Surinamese, gypsies, asylum seekers, immigrants, guest workers, trailer people, Antilles, tamils
  • 9. CONCLUSIONS • Trace concepts over large periods of time • Greater sensitivity to semantic changes based on corpus • Greater heuristic interactivity with the researcher
  • 10. FUTURE WORK • Optimize algorithm based on different types of conceptual changes • Query expansion. Use this technique to find relevant related words within specific periods
  • 12. (2009): 71. Deleuze, Gilles. A Thousand Plateaus: Capitalism and Schizophrenia. University of Minnesota Press, 1987. Huijnen, Pim, Fons Laan, Maarten de Rijke, and Toine Pieters. “A Digital Humanities Approach to the History of Science.” In Social Informatics, edited by Akiyo Nadamoto, Adam Jatowt, Adam Wierzbicki, and Jochen L. Leidner, 71– 85. Lecture Notes in Computer Science 8359. Springer Berlin Heidelberg, 2014. Kenter, Tom, Melvin Wevers, and Pim Huijnen “Ad Hoc Monitoring of Vocabulary Shifts over Time.” To be published Kim, Yoon, Yi-I. Chiu, Kentaro Hanaki, Darshan Hegde, and Slav Petrov. “Temporal Analysis of Language through Neural Language Models.” arXiv:1405.3515 [cs], May 14, 2014. http://arxiv.org/abs/1405.3515. Klingenstein, S., T. Hitchcock, and S. DeDeo. “The Civilizing Process in London’s Old Bailey.” Proceedings of the National Academy of Sciences 111, no. 26 (July 1, 2014): 9419–24. Kruszewski, Marco Baroni Georgiana Dinu Germán. “Don’t Count, Predict! A Systematic Comparison of Context- Counting vs. Context-Predicting Semantic Vectors.” Accessed September 11, 2014. http://anthology.aclweb.org/P/P14/P14-1023.xhtml. Wang, Xuerui, and Andrew McCallum. “Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 424–33. ACM, 2006. Wiedemann, Gregor, Andreas Niekler, and others. “Document Retrieval for Large Scale Content Analysis Using Contextualized Dictionaries.” In Terminology and Knowledge Engineering 2014, 2014. http://hal.archives-ouvertes.fr/hal- 01005879/. Wijaya, Derry Tanti, and Reyyan Yeniterzi. “Understanding Semantic Change of Words over Centuries.” In Proceedings of the 2011 International Workshop on DETecting and Exploiting Cultural diversiTy on the Social Web, 35–40. ACM,

Notas do Editor

  1. Today, I will be highlighted some of the points made in our paper Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces I am part of the Translantis project with the insanely long subtitle: Digital humanities approaches to Reference Cultures: The Emergence of the United States in Dutch Public Discourse 1890-1990. My project looks at the ways in which the United States has appeared as a model, or reference culture, in debates concerning consumerism and modernization.
  2. The central theme of today’s talk is the study of the emergence and transformation of concepts, ideas, and thoughts. SLIDE I am a cultural historian that tries to see how computational tools can aid my work. Historians have increasingly used digital tools for the purposes of conceptual history. SLIDE However, in these studies very often researchers employ pre-defined and ahistorical definitions of concepts. SLIDE Full-text search and n-gram viewers, for example, require workable definition of the concept or range of words that cover the subject to, subsequently, analyze them within certain contexts and periods. The necessity of pre-defining terms is a serious drawback of working with these tracking tools. The research done in this way runs the risk of ahistoricity. SLIDE The same goes for top-down approaches, in which a specific model of language allows for the recognition of certain semantic information, such as specific entities via Named Entity Recognition or via word classification lists. We would like the corpus to generate the list of words that form a concept SLIDE Topic modeling approaches partly circumvent this limitation, although it is rather static. It can infer latent topics from corpus. It does not allow for ad hoc settings. You give an amount of texts as input, you set the parameters, and you are presented with your output. SLIDE We would like to keep our hands on the wheel, to steer the research process, to follow the genealogy of a concept. And what we would really really like is to trace concepts before their key term was even introduced. For instance, the notion of efficiency was thought up in the interwar years, however, even in the years before people talked about similar notions without using the words efficiency. Well what words did they use?!
  3. SLIDE So basically, what we would like is a method to study changes in the meaning / constitution of concepts over time. SLIDE Our main research question then is to see what words were used in the past to talk about specific concepts. This would enable us to show the continuities and discontinuities in discourse, but also to remain sensitive to ambiguities within the words that make up their concepts.
  4. In order to this, we have turned to multi-dimensional word vector spaces. SLIDE These are created using word2vec, a neural language model that does not depart from top-down model of language; rather a semantic space is inferred from the input data. SLIDE As our dataset we have used the digitized newspaper collection from the Dutch National Library, which contains over 500.000 newspaper issues between 1890 and 1990. SLIDE a multi-dimensional word-vector space contains semantic and linguistic regularities that can be used for the analysis of discourse. A positional shift within the vector space has been established as an indicator for changess on a semantic and syntactic level SLIDE Our method introduces a sequential modeling of these vector spaces, for which we make multiple models over time. We create a model for decade, so 1950-1960. Then we move this one year, and create another one. For the period between 1950 and 1990 we have thus created 40 models.
  5. So then we have all these models, how do we trace concepts within these models We trace groups of terms, rather than individual words, by keeping track of semantic relations between terms per period. SLIDE We will use a single seed set of terms, merely as an entry-point into the cluster and then find semantically related words, word that are close to the seed term, within the vector space. This present us with the first layer of words. Then we looked for the related words for these words. This give a semantic graph, that we have pruned by weighing the model using in and out-links. SLIDE This pruned model is then located within the subsequent model in time, and the same process is executed. A key aspect of this procedure is that the original seed words might disappear from the cluster of words over time. Remember, this relates to the example of efficiency I just gave. SLIDE through this approach we try to model ambiguity through time by monitoring the network of words that change in position or leave the network altogether. SLIDE this technique is fast and relatively light. You can query the models using a number of different operators. SLIDE You can peruse the models forwards and backwards.
  6. This is the output for now. I have two examples
  7. Propaganda. Before WW2, propaganda and advertising referred to the same thing, this shifted after WW2. After WW2, we have traced the concept and we have seen that quite suddenly its meaning seems to shift into the realm of advertising. In addition to this tracing, we have looked up the related words through time. This shows that propaganda received a new meaning namely that of political propaganda, more specifically within the cold war context.
  8. Another example is that of the word Aliens. This show how the debate on foreigners has changed over the years. The connotation with tourist and europeans changes into that of illegals, guest worker, immigrants, and people from surinam and the Dutch Antilles.
  9. Concepts Through Time enables historians to trace concepts over large periods without having to manually select appropriate terms for the entire time span and without being dependent on a fixed set of topics. This allows for a greater sensitivity to semantic changes and an increased interactive heuristic approach to concepts within their discursive context.
  10. Different Conceptualization of Types of Conceptual Changes Create an user-interface to visualize concepts in vector-space, but also allow researchers to play with settings when moving through time. Implement this as a function of query expansion. This technique can find relevant related words within historical periods. So if you would look for Efficiency in 1910, it would give you to words used to talk about this concept.