Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
1. Semantic Similarity Measures in Machine Translation
Evaluation
Hanna B´echara
ESR12
Expert Project
June 27, 2015
Hanna B´echara June 27, 2015 1 / 21
3. Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Hanna B´echara June 27, 2015 2 / 21
4. Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
Hanna B´echara June 27, 2015 2 / 21
5. Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Hanna B´echara June 27, 2015 2 / 21
6. Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Meaning Preservation?
Hanna B´echara June 27, 2015 2 / 21
7. Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Meaning Preservation!!
Hanna B´echara June 27, 2015 3 / 21
8. Semantic Textual Similarity
STS Explained
Semantic Textual Similarity (STS) captures the notion that some
texts are more similar than others
Hanna B´echara June 27, 2015 4 / 21
9. Semantic Textual Similarity
STS Explained
Semantic Textual Similarity (STS) captures the notion that some
texts are more similar than others
5 The two sentences are completely equivalent, as they mean the same
thing.
4 The two sentences are mostly equivalent, but some unimportant
details differ.
3 The two sentences are roughly equivalent, but some important
information differs/missing.
2 The two sentences are not equivalent, but share some details.
1 The two sentences are not equivalent, but are on the same topic.
0 The two sentences are on different topics.
Hanna B´echara June 27, 2015 4 / 21
10. Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Hanna B´echara June 27, 2015 5 / 21
11. Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Example 2
Sentence 1: A man is chopping butter into a container.
Sentence 2: A woman is cutting shrimps.
Hanna B´echara June 27, 2015 5 / 21
12. Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Example 2
Sentence 1: A man is chopping butter into a container.
Sentence 2: A woman is cutting shrimps.
Example 3
Sentence 1: A cat is playing with a watermelon on a floor.
Sentence 2: A man is pouring oil into a pan.
Hanna B´echara June 27, 2015 5 / 21
13. Semantic Textual Similarity
How do we estimate STS?
Crowd-Sourced Similarity Ratings
Created for SemEval Workshops
Expert’s SemEval Submission
SVM Regressor
Estimates score between 0 and 5
Train on human annotated sentences provided by the SemEval Shared
Tasks
Trained on a variety of features
Hanna B´echara June 27, 2015 6 / 21
14. Methodology
Research Question
Can we estimate the score X as a function of R (relatedness) and
bA (Quality of A)?
Hanna B´echara June 27, 2015 7 / 21
16. Methodology
Machine Learning Task
Features
1 Baseline Experiments: 17 QuEst features
2 STS score for source sentence pair
3 S-BLEU score for Sentence Pair A
4 S-BLEU score comparing A to B (MT outputs)
Hanna B´echara June 27, 2015 9 / 21
17. Methodology
Machine Learning Task
Features
1 Baseline Experiments: 17 QuEst features
2 STS score for source sentence pair
3 S-BLEU score for Sentence Pair A
4 S-BLEU score comparing A to B (MT outputs)
SVM Regression Model
Predicts a score between 0–1
2000 sentences for training – 500 sentences for testing
Hanna B´echara June 27, 2015 9 / 21
18. Methodology
Results
Mean Baseline QuEst Baseline (17) STS (3) Combined (20)
MAE 0.16 0.12 0.108 0.09
Table: Predicting the BLEU scores for DGT-TM - Mean Absolute Error
Hanna B´echara June 27, 2015 10 / 21
20. Methodology
Machine Learning Task
Features
1 STS score for source sentence pair
2 S-BLEU score for Sentence A
3 S-BLEU score comparing A to B (MT outputs)
Hanna B´echara June 27, 2015 12 / 21
21. Methodology
Machine Learning Task
Features
1 STS score for source sentence pair
2 S-BLEU score for Sentence A
3 S-BLEU score comparing A to B (MT outputs)
SVM Regression Model
Predicts a score between 0–1
4000 sentences for training – 500 sentences for testing
Hanna B´echara June 27, 2015 12 / 21
22. Methodology
Results
Mean Baseline STS (3)
MAE 0.216 0.193
Table: Predicting the S-BLEU scores for SICK’s Backtranslations - Mean
Absolute Error
Hanna B´echara June 27, 2015 13 / 21
24. Methodology
Data Preparation
Extracted sentences from the FLICKR images dataset used for
previous SemEval tasks
Each pair has a human similarity rating between 0-5
Each sentence has a French machine translation and quality score for
each translation, between 1 and 5, assigned through manual
evaluation
Each French sentence pair produced by the machine translation is
also assigned a similarity rating through manual evaluation.
Hanna B´echara June 27, 2015 15 / 21
25. Methodology
Example
Sentence A
A group of kids is playing in a yard and an old man is standing in the background
Sentence B
A group of boys in a yard is playing and a man is standing in the background
Semantic Similarity between A and B: 4.5
Sentence A - MT Output
Un groupe d’enfants joue dans une cour et un vieil homme est debout dans l’arri`ere-plan
Sentence B - MT Output
Un groupe de gar¸cons dans une cour joue et un homme est debout dans l’arri`ere-plan
Semantic Similarity between A - MT Output and B - MT Output: ?
Hanna B´echara June 27, 2015 16 / 21
26. Methodology
Example
Sentence A
eurozone unemployment at record 12 percent
Sentence B
eurozone unemployment hits record 12.1 % in march
Semantic Similarity between A and B: 4.5
Sentence A - MT Output
lors de la zone euro 12 % de chˆomage record
Sentence B - MT Output
le chˆomage frappe 12.1 % de la zone euro en marche proc`es-verbalan
Semantic Similarity between A - MT Output and B - MT Output: ?
Hanna B´echara June 27, 2015 17 / 21
27. Methodology
Experiments
Features Sets
1 Baseline Experiments: 17 QuEst features
2 STS score for sentence pair
3 Human evaluation score for Pair B (MT Output)
4 S-BLEU score comparing Pair A to Pair B (MT outputs)
Hanna B´echara June 27, 2015 18 / 21
28. Methodology
Experiments
Features Sets
1 Baseline Experiments: 17 QuEst features
2 STS score for sentence pair
3 Human evaluation score for Pair B (MT Output)
4 S-BLEU score comparing Pair A to Pair B (MT outputs)
SVM Regression Model
800 sentences for training – 200 sentences for testing
Predicts a score between 1–5
Hanna B´echara June 27, 2015 18 / 21
30. Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
Hanna B´echara June 27, 2015 20 / 21
31. Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
We can learn the quality of a Sentence B if we have a semantically
similar sentence A with a determined quality
Hanna B´echara June 27, 2015 20 / 21
32. Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
We can learn the quality of a Sentence B if we have a semantically
similar sentence A with a determined quality
However, we require access to semantically similar sentences
Hanna B´echara June 27, 2015 20 / 21
33. The End (For Now)
Enjoy the Weekend!
Hanna B´echara June 27, 2015 21 / 21