2022-10-18-KBR-for publication.pdf

Change is Key!
An introduction to lexical
semantic change
Nina Tahmasebi, Associate Professor & Simon Hengchen, Phd
University of Gothenburg
October 2022, KBR
Digital Heritage Seminar Series: Lexical Semantic Change

Some facts
• 6 years
• 6 partner universities
• Members from 4 countries
• With advisors, 6 countries
• 13 people including PM and SE

October, 2022 |
Nina Tahmasebi | KBR DH seminar

Word meaning change
Over time
He was an
awesome leader!
He was an
awesome leader!
time
In different contexts (at the same time)
St. Petersburg St. Petersburg
Petrograd
Leningrad
time
October, 2022 |

main CHALLENGES for
computational models of meaning and change
Handle languages with
smaller amounts of data
Sense-aware models
Find out WHAT changed,
HOW and WHEN
Generalize to
multiple languages
Computational
models of
meaning and
change
October, 2022 |

Computational
models of
meaning and
change
Language level change
Historical Linguistics
Our Research Questions
October, 2022 |

Computational
models of
meaning and
change
Language level change
Lexicography
October, 2022 |

Computational
models of
meaning and
change
Societal level change
Analytical Sociology
October, 2022 |

Computational
models of
meaning and
change
Gender Studies
Gender Studies
October, 2022 |

Computational
models of
meaning and
change
Gender Studies
Literary Studies

Our societal contribution
Meaning for everyone
clams muslim
clams muslim
October, 2022 |

Methods for
computational
semantic change
October, 2022 |

?
October, 2022 |

Explicit, count-based vector representations
MOUSE
2 3 1 5 5 4 8
October, 2022 |

MOUSE
0.4 0.4. .5 .02 .005 0.1 0.9
? ? ? ? ? ? ? ? ? ? ? ?
Statistical, learned vector representations
October, 2022 |

Word embeddings shown in 2D instead of 50-100000
Image: Nieto Pina and Johansson, RANLP’15
October, 2022 |

collective text
individual
individual text
signal
topic, cluster, vector…
Pipeline
signal change
October, 2022 |

word
word Single-sense
Sense-differentiated
Stone
Music
Lifestyle
Rock
October, 2022 |

Change type
Novel related ws
Novel unrelated ws
Broadening
Join
Narrowing
Split
Death
Novel word sense
Novel word
Change
Single-sense
October, 2022 |

Difficulty:
What does a word mean?
When are two meanings the same?
October, 2022 |

word
word Single-sense
October, 2022 |

count-based embeddings
dynamic embeddings
neural embeddings
Single-sense
2013
2008 2012
2010
2009 2011 2014 2015 2016 2017 2018 2019
topic models
word sense induction
contextual embeddings
2020
Giulianelli
et al
2020
Hu et al
2019
Tahmasebi et al.
2008
Mitra et al
2015
Tahmasebi & Risse
2017
Wijaya
& Yentizerzi
2011
Lau et al
2012
Frerman & Lapata
2016
Bamler & Mandt
2018
Kim et al
2014
Kulkarni et
al
2015
Hamilton et al
2016
Sagi et al
2009
Basile et al
2016

2013
2008 2012
2010
2009 2011 2014 2015 2016 2017 2018
embeddings
dynamic embeddings
neural embeddings
2019 2020
topic models
October, 2022 |

Word-level
semantic change
embeddings / context-based methods
dynamic embeddings
neural embeddings
October, 2022 |

Context-based method
Sagi et al.
GEMS 2009
context vectors
w
ti
tj
Broadening of sense
Narrowing of sense
With grouping:
Added/removed sense
Data set split in approp. sets
BUT: 1. 2. No alignment of senses over time!
No discrimination between senses

Word embedding-based models
Kulkarni et al. WWW’15
Project a word onto a vector/point
(POS, frequency and embeddings)
Track vectors over time
Kim et al. LACSS 2014
Basile et al. CLiC-it 2016
Hamilton et al. ACL 2016
Image: Kulkarni et al. WWW’15

LSC – individually trained embedding spaces
Single-point
embedding space
ti
multiple
time points
Track an individual
word w over time
Change
point/degree
detection
1 Embedding space
Alignment
2
Change degree/ point
3
align
Vector space image:
Nieto Pina and
Johansson, RANLP’15

LSC – dynamic embedding spaces
Align while
training
Track an individual
word w over time
Change
point/degree
detection

Dynamic Embeddings
Sharing data is highly beneficial!
Bamler & Mandt:
• Bayesian Skip-gram
Yao et al:
• PPMI embeddings
Rudolph & Blei:
• Exponential family embeddings
(Beronoulli embeddings)
Share data across all time points
Avoids aligning

Temporal Referencing
Sharing data is highly beneficial!
Share contexts across all time points
Indivudal vectors for words for each bin
Avoids aligning
Dubossarsky et al
• SGNS
• PPMI embeddings
October, 2022 |

semantic change
topic models
October, 2022 |

Wijaya & Yeniterzi
DETECT '11
Cook et al.
Coling 2014
Frermann & Lapata
TACL 2016
Topic-based methods
Finally, we
conduct a
preliminary
evaluation in
which we apply
our methods to
the task of
The meanings
of words are
not fixed but in
fact undergo
change
BNC ukWaC
1 Topic model (HDP)
Assign topics to all instances of a word.
2
If a word sense WSi is assigned to collection 2
but not 1 then WSi is a novel word sense.
3
BUT:
Only two time points (typically there is much noise!)
No alignment of senses over time!
A
B
Lau et al.
EACL 2014
October, 2022 |

Downsides topic models
Topic
change
Sense
change
October, 2022 |

Word sense induction
Word sense induction
(curvature clustering)
individual time slices
Tahmasebi & Risse, RANLP2017
Stone
Music
Lifestyle
Rock
Step 1: Step 2: Step 3:
Detecting stable
senses
→ units
Relating units
Paths
October, 2022 |

Type-based embedding methods
w
Sentence with w and more
Different sentence with w and more
Last sentence with w and more
October, 2022 |

Token-based embedding methods
w
w
w
Sentence with w and more
Different sentence with w and more
Last sentence with w and more
October, 2022 |

October, 2022 |
Evaluation
individual
individual text
signal
topic, cluster, vector…
signal change
collective text
minimum optimum medium

October, 2022 |
• Positive examples
• Negative examples
• Pairs
Evaluation
Controlled data
3
ways
Top/bottom
results
Pre-determined list of:
October, 2022 |

Summary of methods
• Most co-occurrence methods
• are outperformed by type-
embeddings
• Type-embeddings
• average embeddings
• need alignment across corpora
• need very much data
• Dynamic embeddings
• ‘remember’ too much historical
• Topic-based method
• have little correspondence to
senses
• (and run badly on too large
datasets)
• WSI-based method
• have typically too low coverage
• Contextual embeddings
• need to be clustered into senses

Thank you!
Nina.tahmasebi@gu.se
nina@tahmasebi.se

2022-10-18-KBR-for publication.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Nina Tahmasebi

Mais de Nina Tahmasebi (6)

Último

Último (20)

2022-10-18-KBR-for publication.pdf