ICT Role in 21st Century Education & its Challenges.pptx
Ā
Similarity Features, and their Role in Concept Alignment Learning
1. Introduction Three classiļ¬ers Experiments and results Summary
Similarity Features, and their Role in Concept
Alignment Learning
Shenghui Wang1 Gwenn Englebienne2 Christophe GuĀ“eret 1
Stefan Schlobach1 Antoine Isaac1 Martijn Schut1
1 Vrije Universiteit Amsterdam
2 Universiteit van Amsterdam
SEMAPRO 2010
Florence
2. Introduction Three classiļ¬ers Experiments and results Summary
Outline
1 Introduction
Classiļ¬cation of concept mappings based on instance
similarity
2 Three classiļ¬ers
Markov Random Field
Multi-objective Evolution Strategy
Support Vector Machine
3 Experiments and results
4 Summary
3. Introduction Three classiļ¬ers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., āplankzeilenā vs. āsurfsportā
e.g., āarcheologyā vs. āexcavationā
4. Introduction Three classiļ¬ers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., āplankzeilenā vs. āsurfsportā
e.g., āarcheologyā vs. āexcavationā
5. Introduction Three classiļ¬ers Experiments and results Summary
Thesaurus mapping
SemanTic Interoperability To access Cultural Heritage
(STITCH) through mappings between thesauri
Scope of the problem:
Big thesauri with tens of thousands of concepts
Huge collections (e.g., National Library of the Neterlands:
80km of books in one collection)
Heterogeneous (e.g., books, manuscripts, illustrations, etc.)
Multi-lingual problem
Solving matching problems is one step to the solution of the
interoperability problem.
e.g., āplankzeilenā vs. āsurfsportā
e.g., āarcheologyā vs. āexcavationā
6. Introduction Three classiļ¬ers Experiments and results Summary
Automatic alignment techniques
Lexical
labels and textual information of entities
Structural
structure of the formal deļ¬nitions of entities, position in the
hierarchy
Extensional
statistical information of instances, i.e., objects indexed with
entities
Background knowledge
using a shared conceptual reference to ļ¬nd links indirectly
10. Introduction Three classiļ¬ers Experiments and results Summary
Pros and cons
Advantages
Simple to implement
Interesting results
Disadvantages
Requires suļ¬cient amounts of common instances
Only uses part of the available information
14. Introduction Three classiļ¬ers Experiments and results Summary
Representing concepts and the similarity between them
Instance features Concept features Pair features
Cos. dist.
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Bag of words
Creator
Title
Publisher
...
Creator
Title
Publisher
...
Creator
Title
Publisher
...
...
f1
f2
f3
Concept1Concept2
{{
{
{
{
Creator
Title
Publisher
...
Creator
Title
Publisher
...
Creator
Term 1: 4
Term 2: 1
Term 3: 0
...
Title
Term 1: 0
Term 2: 3
Term 3: 0
...
Publisher
Term 1: 2
Term 2: 1
Term 3: 3
...
Creator
Term 1: 2
Term 2: 0
Term 3: 0
...
Title
Term 1: 0
Term 2: 4
Term 3: 1
...
Publisher
Term 1: 4
Term 2: 1
Term 3: 1
...
Cos. dist.
Cos. dist.
15. Introduction Three classiļ¬ers Experiments and results Summary
Classiļ¬cation of concept mappings based on instance similarity
Classiļ¬cation based on instance similarity
Each pair of concepts is treated as a point in a āsimilarity
spaceā
Its position is deļ¬ned by the features of the pair.
The features of the pair are the diļ¬erent measures of similarity
between the conceptsā instances.
Hypothesis: the label of a point ā which represents whether
the pair is a positive mapping or negative one ā is correlated
with the position of this point in this space.
With already labelled points and the actual similarity values of
concepts involved, it is possible to classify a point, i.e., to give
it a right label, based on its location given by the actual
similarity values.
16. Introduction Three classiļ¬ers Experiments and results Summary
Classiļ¬cation of concept mappings based on instance similarity
Research questions
How do diļ¬erent classiļ¬ers perform on this instance-based
mapping task?
What are the beneļ¬ts of using a machine learning algorithm
to determine the importance of features?
Are there regularities wrt. the relative importance given to
speciļ¬c features for similarity computation? Are these weights
related to application data characteristics?
17. Introduction Three classiļ¬ers Experiments and results Summary
Three classiļ¬ers used
Markov Random Field (MRF)
Evolutionary Strategy (ES)
Support Vector Machine (SVM)
18. Introduction Three classiļ¬ers Experiments and results Summary
Markov Random Field
Markov Random Field
Let T = { (x(i), y(i)) }N
i=1 be the training set
x(i)
ā RK
, the features
y(i)
ā Y = {positive, negative}, the label
The conditional probability of a label given the input is
modelled as
p(y(i)
|xi , Īø) =
1
Z(xi , Īø)
exp
K
j=1
Ī»j Ļj (y(i)
, x(i)
) , (1)
where Īø = { Ī»j }K
j=1 are the weights associated to the feature
functions Ļ and Z(xi , Īø) is a normalisation constant
19. Introduction Three classiļ¬ers Experiments and results Summary
Markov Random Field
The classiļ¬er used: Markov Random Field (contā)
The likelihood of the data set for given model parameters
p(T|Īø) is given by:
p(T|Īø) =
N
i=1
p(y(i)
|x(i)
) (2)
During learning, our objective is to ļ¬nd the most likely values
for Īø for the given training data.
The decision criterion for assigning a label y(i) to a new pair
of concepts i is then simply given by:
y(i)
= argmax
y
p(y|x(i)
) (3)
20. Introduction Three classiļ¬ers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy
Evolutionary strategies (ES) have two characteristic properties:
ļ¬rstly, they are used for continuous value optimisation, and,
secondly, they are self-adaptive.
An ES individual is a direct model of the searched solution,
deļ¬ned by Ī and some evolution strategy parameters:
Ī, Ī£ ā Ī»1, . . . , Ī»K , Ļ1, . . . , ĻK (4)
The ļ¬tness function is related to the decision criterion for the
ES, which is sign-based:
LES
i =
1 if K
j=1 Ī»i Fij > 0
0 otherwise
(5)
21. Introduction Three classiļ¬ers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (contā)
Maximising the number of positive results and negative results
are two opposite goals.
f1(Ī | F, L) = #{Fi |
K
j=1
Ī»i Fij > 0 ā§ Li = 1} (6)
f2(Ī | F, L) = #{Fi |
K
j=1
Ī»i Fij ā¤ 0 ā§ Li = 0} (7)
22. Introduction Three classiļ¬ers Experiments and results Summary
Multi-objective Evolution Strategy
Multi-objective Evolution Strategy (contā)
Evolution process
Recombination: Two parent individuals are combined using
diļ¬erent weighting, producing two new individuals
Mutation: One parent individual changes itself into a new child
individual
Survivor selection: NSGA-II
23. Introduction Three classiļ¬ers Experiments and results Summary
Support Vector Machine
Support Vector Machine
Support Vector Machine (SVM) is used as a maximum margin
classiļ¬er whose task consists in ļ¬nding an hyperplane separating
the two classes.
The objective is to maximise the margin separating the two
classes whilst minimizing classiļ¬cation error risk.
24. Introduction Three classiļ¬ers Experiments and results Summary
Experiments and results
Thesauri to match: GTT (35K) and Brinkman (5K)
Instances: 1 million books
GTT annotated books: 307K
Brinkman annotated books: 490K
Dually annotated books: 222K
25. Introduction Three classiļ¬ers Experiments and results Summary
Feature slection for similarity calculation
Ī»j Feature
1 Lexical
2 Jaccard
3 Date
4 ISBN
5 NBN
6 PPN
7 SelSleutel
8 abstract
9 alternative
10 annotation
Ī»j Feature
11 author
12 contributor
13 creator
14 dateCopyrighted
15 description
16 extent
17 hasFormat
18 hasPart
19 identiļ¬er
20 isVersionOf
Ī»j Feature
21 issued
22 language
23 mods:edition
24 publisher
25 refNBN
26 relation
27 spatial
28 subject
29 temporal
30 title
Table: List of the features
26. Introduction Three classiļ¬ers Experiments and results Summary
Quality of learning
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
MRF 1-30
MRF 3-30
ES
SVM
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
0
0.2
0.4
0.6
0.8
1
Precision Recall F-measure
Figure: Precision, recall and F-Measure for mappings with a positive
label (top) and a negative label (bottom). Error bars indicate one
standard deviation over the 10 folds of cross-validation.
27. Introduction Three classiļ¬ers Experiments and results Summary
Relative importance of features
Which features of our instances are important for mapping?
Figure: Mutual information between features and labels
28. Introduction Three classiļ¬ers Experiments and results Summary
Relative importance of features
ES lambdas are not really conclusive
ES lambdas that are most inconclusive correspond to the least
informative features
29. Introduction Three classiļ¬ers Experiments and results Summary
Relative importance of features
Important features in terms of mutual information are
associated to large MRF weights
30. Introduction Three classiļ¬ers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language
(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
31. Introduction Three classiļ¬ers Experiments and results Summary
A more detailed analysis
Expected important features:
Label similarity (1), instance overlap (2), subject (28), etc.
Expected unimportant features:
Size of the book (16), format description (17) and language
(22), etc.
Surprisingly important features:
Date (14)
Surprisingly unimportant features:
Description (15) and abstract (8)
32. Introduction Three classiļ¬ers Experiments and results Summary
Summary
We tried three machine learning classiļ¬ers on the
instance-based mapping task, among which MRF and ES can
automatically identify meaningful features.
The MRF and the ES, result in a performance in the
neighbourhood of 90%, showing the validity of the approach.
Our analysis suggests that when many diļ¬erent description
features interact, there is no systematic correlation between
what a learning method could ļ¬nd and what an application
expert may anticipate.