In this talk, we will present the Change is Key! program, a 6-year research program where we combine methods for semantic change and lexical variation to answer research questions stemming from humanities and social sciences. We will first introduce different classes of methods for computationally detecting semantic change, ranging from topic modelling to contextual embeddings, and discuss how the results should be valued and evaluated. The talk will further shed light on research questions from the humanities and social science focus domains that will be tackled in the Change is Key! program
1. Change is Key!
An introduction to lexical
semantic change
Nina Tahmasebi, Associate Professor & Simon Hengchen, Phd
University of Gothenburg
October 2022, KBR
Digital Heritage Seminar Series: Lexical Semantic Change
2. Some facts
• 6 years
• 6 partner universities
• Members from 4 countries
• With advisors, 6 countries
• 13 people including PM and SE
4. Word meaning change
Over time
He was an
awesome leader!
He was an
awesome leader!
time
In different contexts (at the same time)
St. Petersburg St. Petersburg
Petrograd
Leningrad
time
October, 2022 |
Nina Tahmasebi | KBR DH seminar
5. main CHALLENGES for
computational models of meaning and change
Handle languages with
smaller amounts of data
Sense-aware models
Find out WHAT changed,
HOW and WHEN
Generalize to
multiple languages
Computational
models of
meaning and
change
October, 2022 |
Nina Tahmasebi | KBR DH seminar
23. count-based embeddings
dynamic embeddings
neural embeddings
Single-sense
Sense-differentiated
2013
2008 2012
2010
2009 2011 2014 2015 2016 2017 2018 2019
topic models
word sense induction
contextual embeddings
2020
Giulianelli
et al
2020
Hu et al
2019
Tahmasebi et al.
2008
Mitra et al
2015
Tahmasebi & Risse
2017
Wijaya
& Yentizerzi
2011
Lau et al
2012
Frerman & Lapata
2016
Bamler & Mandt
2018
Kim et al
2014
Kulkarni et
al
2015
Hamilton et al
2016
Sagi et al
2009
Basile et al
2016
26. Context-based method
Sagi et al.
GEMS 2009
context vectors
w
ti
tj
Broadening of sense
Narrowing of sense
With grouping:
Added/removed sense
Data set split in approp. sets
BUT: 1. 2. No alignment of senses over time!
No discrimination between senses
27. Word embedding-based models
Kulkarni et al. WWW’15
Project a word onto a vector/point
(POS, frequency and embeddings)
Track vectors over time
Kim et al. LACSS 2014
Basile et al. CLiC-it 2016
Hamilton et al. ACL 2016
Image: Kulkarni et al. WWW’15
28. LSC – individually trained embedding spaces
Single-point
embedding space
ti
multiple
time points
Track an individual
word w over time
Change
point/degree
detection
1 Embedding space
Alignment
2
Change degree/ point
3
align
Vector space image:
Nieto Pina and
Johansson, RANLP’15
29. LSC – dynamic embedding spaces
Align while
training
Track an individual
word w over time
Change
point/degree
detection
30. Dynamic Embeddings
Sharing data is highly beneficial!
Bamler & Mandt:
• Bayesian Skip-gram
Yao et al:
• PPMI embeddings
Rudolph & Blei:
• Exponential family embeddings
(Beronoulli embeddings)
Share data across all time points
Avoids aligning
31. Temporal Referencing
Sharing data is highly beneficial!
Share contexts across all time points
Indivudal vectors for words for each bin
Avoids aligning
Dubossarsky et al
• SGNS
• PPMI embeddings
October, 2022 |
Nina Tahmasebi | KBR DH seminar
33. Wijaya & Yeniterzi
DETECT '11
Cook et al.
Coling 2014
Frermann & Lapata
TACL 2016
Topic-based methods
Finally, we
conduct a
preliminary
evaluation in
which we apply
our methods to
the task of
The meanings
of words are
not fixed but in
fact undergo
change
BNC ukWaC
1 Topic model (HDP)
Assign topics to all instances of a word.
2
If a word sense WSi is assigned to collection 2
but not 1 then WSi is a novel word sense.
3
BUT:
Only two time points (typically there is much noise!)
No alignment of senses over time!
A
B
Lau et al.
EACL 2014
October, 2022 |
Nina Tahmasebi | KBR DH seminar
35. Word sense induction
Word sense induction
(curvature clustering)
individual time slices
Tahmasebi & Risse, RANLP2017
Stone
Music
Lifestyle
Rock
Step 1: Step 2: Step 3:
Detecting stable
senses
→ units
Relating units
Paths
October, 2022 |
Nina Tahmasebi | KBR DH seminar
37. Type-based embedding methods
w
Sentence with w and more
Different sentence with w and more
Last sentence with w and more
October, 2022 |
Nina Tahmasebi | KBR DH seminar
40. October, 2022 |
Nina Tahmasebi | KBR DH seminar
Evaluation
individual
individual text
signal
topic, cluster, vector…
signal change
collective text
minimum optimum medium
42. Summary of methods
• Most co-occurrence methods
• are outperformed by type-
embeddings
• Type-embeddings
• average embeddings
• need alignment across corpora
• need very much data
• Dynamic embeddings
• ‘remember’ too much historical
• Topic-based method
• have little correspondence to
senses
• (and run badly on too large
datasets)
• WSI-based method
• have typically too low coverage
• Contextual embeddings
• need to be clustered into senses