1. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Timo Honkela
Modeling Meaning and Knowledge
1 Feb 2016
timo.honkela@helsinki.fi
Spaces of Knowledge
2. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
http://www.cs.cornell.edu/Info/
Department/Annual95/Faculty/Salton.html
Advent of vector-based
information retrieval
●
Gerarg Salton: Documents and
queries represented as vectors of
term counts
● Similarity between a document
and a query is given by the cosine
between the term vector and the
document vector
● TF-IDF (term-frequency-inverse-
document frequency) for weighting
of a term in a document
● Inverse document frequency had
been introduced by Karen
Spärck-Jones in 1972
https://en.wikipedia.org/wiki/Gerard_Salton
https://en.wikipedia.org/wiki/Karen_Sp%C3%A4rck_Jones
3. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
University
Society
D
D
D
Q Q
Q
1
1
2
2
3
3
Document 1: The word “university”
appears three times and “society” once, etc.
Query 1: “university”
https://en.wikipedia.org/wiki/Cosine_similarity
https://en.wikipedia.org/wiki/Sine
4. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Contexts tell about meaning
● John Rupert Firth: “You shall know a word
by the company it keeps”
● Ludwig Wittgenstein: “For a large class of
cases of the employment of the word
‘meaning’—though not for all—this way can be
explained in this way: the meaning of a word is
its use in the language” (PI 43)
https://en.wikipedia.org/wiki/John_Rupert_Firth
http://plato.stanford.edu/entries/wittgenstein/#Mea
https://en.wikipedia.org/wiki/Ludwig_Wittgenstein
5. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Analysis of term-document matrices
● The same idea as in information retrieval can
also be applied in studying words and
expressions
● Statistical analysis of document-term matrices
gives rise to models of relationship between
words or documents
● Classical examples include
– Latent Semantic Analysis (Deerwester, Dumais et al. 1988)
– Self-Organizing Semantic Maps (Ritter & Kohonen 1989)
6. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Word spaces, clusters, clouds, ...
● The analysis of the statistical information
related to word contexts can be turned into
visualizations of the word relations
7. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
Automated learning of word relations
using self-organizing map on text context data
8. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Chemistry
Natural sciences
and engineering
Bio- and
environmental
sciences
Health
Culture and
society
Map of Finnish Science
(T. Honkela & M. Klami 2007)
9. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
From term weighting
to term selection
● TF-IDF is a widely used method for term
weighting
● Likey (Language Independent Keyphrase
Extraction) was developed to select terms
automally by camparing the corpus at hand
with another corpus, called a reference corpus
(Paukkeri et al. 2008, Paukkeri & Honkela 2010)
10. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
1. the 1276847
2. of 1067918
3. and 817852
4. in 625330
5. to 357453
6. for 225307
7. is 205723
8. on 162509
9. research 157251
10. be 151475
11. with 136854
12. will 135992
13. as 122707
14. are 116508
15. by 113878
16. university 98003
...
1. the 2023617
2. of 945622
3. to 883206
4. and 717718
5. in 611421
6. that 473739
7. a 445775
8. is 445119
9. we 305590
10. for 296092
11. i 290412
12. this 286924
13. on 274614
14. it 251343
15. be 246917
16. are 197082
...
Most frequent word forms (types) in
two corpora
Academy
corpus
Europarl
corpus
11. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Documents
Terms
SOM
Document map
Likey
Reference
corpus
(EU partiament)
Academy
corpus
Term list
12. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Extralinguistic contexts
● Human beings learn language in real world
contexts that include visual, tactile, etc.
perceptions
● In order to model meaning in a human-like
manner, these other modalities have to be taken
into account
● In a project called “Multimodally Grounded
Language Technology” we associated visual
patterns of human movements with expressions
that had been used to describe these
movements
13. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
RUNNING
WALKING
LIMPING
JOGGING
14. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Modeling subjectivity
of meaning
● In our method Grounded Intersubjective
Concept Analysis (GICA), we added a new
“dimension” to the term-document matrices
● We did not assume that each person
understands and uses every word in a similar
manner but wanted to model the personal
variation
● This was achieved by using Subject-Object-
Context tensors (Honkela et al. 2012)
15. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
GICA: Grounded Intersubjective
Concept Analysis
Honkela,
Raitio,
Lagus &
Nieminen
2012
16. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Analysis of “health” in the
State of the Union addresses
Subjects on objects in contexts:
Using GICA method to quantify
epistemological subjectivity.
Timo Honkela, Juha Raitio, Krista Lagus,
Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.
Proc. of IJCNN 2012.
17. Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016
Thank you for
you attention!