SlideShare a Scribd company logo
1 of 33
Download to read offline
Semantic Similarity Measures in Machine Translation
Evaluation
Hanna B´echara
ESR12
Expert Project
June 27, 2015
Hanna B´echara June 27, 2015 1 / 21
Machine Translation Evaluation
How do we define translation quality?
Hanna B´echara June 27, 2015 2 / 21
Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Hanna B´echara June 27, 2015 2 / 21
Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
Hanna B´echara June 27, 2015 2 / 21
Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Hanna B´echara June 27, 2015 2 / 21
Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Meaning Preservation?
Hanna B´echara June 27, 2015 2 / 21
Machine Translation Evaluation
How do we define translation quality?
Fluency? Grammaticality? Readability?
Post-editing effort?
How well it matches a reference translation?
Meaning Preservation!!
Hanna B´echara June 27, 2015 3 / 21
Semantic Textual Similarity
STS Explained
Semantic Textual Similarity (STS) captures the notion that some
texts are more similar than others
Hanna B´echara June 27, 2015 4 / 21
Semantic Textual Similarity
STS Explained
Semantic Textual Similarity (STS) captures the notion that some
texts are more similar than others
5 The two sentences are completely equivalent, as they mean the same
thing.
4 The two sentences are mostly equivalent, but some unimportant
details differ.
3 The two sentences are roughly equivalent, but some important
information differs/missing.
2 The two sentences are not equivalent, but share some details.
1 The two sentences are not equivalent, but are on the same topic.
0 The two sentences are on different topics.
Hanna B´echara June 27, 2015 4 / 21
Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Hanna B´echara June 27, 2015 5 / 21
Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Example 2
Sentence 1: A man is chopping butter into a container.
Sentence 2: A woman is cutting shrimps.
Hanna B´echara June 27, 2015 5 / 21
Semantic Textual Similarity
Examples
Example 1
Sentence 1: A brown dog is attacking another animal in front of the man
in pants
Sentence 2:Two dogs are fighting
Example 2
Sentence 1: A man is chopping butter into a container.
Sentence 2: A woman is cutting shrimps.
Example 3
Sentence 1: A cat is playing with a watermelon on a floor.
Sentence 2: A man is pouring oil into a pan.
Hanna B´echara June 27, 2015 5 / 21
Semantic Textual Similarity
How do we estimate STS?
Crowd-Sourced Similarity Ratings
Created for SemEval Workshops
Expert’s SemEval Submission
SVM Regressor
Estimates score between 0 and 5
Train on human annotated sentences provided by the SemEval Shared
Tasks
Trained on a variety of features
Hanna B´echara June 27, 2015 6 / 21
Methodology
Research Question
Can we estimate the score X as a function of R (relatedness) and
bA (Quality of A)?
Hanna B´echara June 27, 2015 7 / 21
Methodology
DGT Translation Memory
DGT-Translation Memory (EN-FR)
500 sentences x 5 most similar matches
Evaluation: S-BLEU (0–1) – Reference French Translations
Hanna B´echara June 27, 2015 8 / 21
Methodology
Machine Learning Task
Features
1 Baseline Experiments: 17 QuEst features
2 STS score for source sentence pair
3 S-BLEU score for Sentence Pair A
4 S-BLEU score comparing A to B (MT outputs)
Hanna B´echara June 27, 2015 9 / 21
Methodology
Machine Learning Task
Features
1 Baseline Experiments: 17 QuEst features
2 STS score for source sentence pair
3 S-BLEU score for Sentence Pair A
4 S-BLEU score comparing A to B (MT outputs)
SVM Regression Model
Predicts a score between 0–1
2000 sentences for training – 500 sentences for testing
Hanna B´echara June 27, 2015 9 / 21
Methodology
Results
Mean Baseline QuEst Baseline (17) STS (3) Combined (20)
MAE 0.16 0.12 0.108 0.09
Table: Predicting the BLEU scores for DGT-TM - Mean Absolute Error
Hanna B´echara June 27, 2015 10 / 21
Methodology
SICK
SICK (Sentences Involving Compositional Knowledge )
4500 sentence pairs
Evaluation: S-BLEU Backtranslations
Hanna B´echara June 27, 2015 11 / 21
Methodology
Machine Learning Task
Features
1 STS score for source sentence pair
2 S-BLEU score for Sentence A
3 S-BLEU score comparing A to B (MT outputs)
Hanna B´echara June 27, 2015 12 / 21
Methodology
Machine Learning Task
Features
1 STS score for source sentence pair
2 S-BLEU score for Sentence A
3 S-BLEU score comparing A to B (MT outputs)
SVM Regression Model
Predicts a score between 0–1
4000 sentences for training – 500 sentences for testing
Hanna B´echara June 27, 2015 12 / 21
Methodology
Results
Mean Baseline STS (3)
MAE 0.216 0.193
Table: Predicting the S-BLEU scores for SICK’s Backtranslations - Mean
Absolute Error
Hanna B´echara June 27, 2015 13 / 21
Methodology
Designing our Own
Objective
Create a dataset of semantically related sentences their machine
translations, and their quality.
Hanna B´echara June 27, 2015 14 / 21
Methodology
Data Preparation
Extracted sentences from the FLICKR images dataset used for
previous SemEval tasks
Each pair has a human similarity rating between 0-5
Each sentence has a French machine translation and quality score for
each translation, between 1 and 5, assigned through manual
evaluation
Each French sentence pair produced by the machine translation is
also assigned a similarity rating through manual evaluation.
Hanna B´echara June 27, 2015 15 / 21
Methodology
Example
Sentence A
A group of kids is playing in a yard and an old man is standing in the background
Sentence B
A group of boys in a yard is playing and a man is standing in the background
Semantic Similarity between A and B: 4.5
Sentence A - MT Output
Un groupe d’enfants joue dans une cour et un vieil homme est debout dans l’arri`ere-plan
Sentence B - MT Output
Un groupe de gar¸cons dans une cour joue et un homme est debout dans l’arri`ere-plan
Semantic Similarity between A - MT Output and B - MT Output: ?
Hanna B´echara June 27, 2015 16 / 21
Methodology
Example
Sentence A
eurozone unemployment at record 12 percent
Sentence B
eurozone unemployment hits record 12.1 % in march
Semantic Similarity between A and B: 4.5
Sentence A - MT Output
lors de la zone euro 12 % de chˆomage record
Sentence B - MT Output
le chˆomage frappe 12.1 % de la zone euro en marche proc`es-verbalan
Semantic Similarity between A - MT Output and B - MT Output: ?
Hanna B´echara June 27, 2015 17 / 21
Methodology
Experiments
Features Sets
1 Baseline Experiments: 17 QuEst features
2 STS score for sentence pair
3 Human evaluation score for Pair B (MT Output)
4 S-BLEU score comparing Pair A to Pair B (MT outputs)
Hanna B´echara June 27, 2015 18 / 21
Methodology
Experiments
Features Sets
1 Baseline Experiments: 17 QuEst features
2 STS score for sentence pair
3 Human evaluation score for Pair B (MT Output)
4 S-BLEU score comparing Pair A to Pair B (MT outputs)
SVM Regression Model
800 sentences for training – 200 sentences for testing
Predicts a score between 1–5
Hanna B´echara June 27, 2015 18 / 21
Methodology
Results
Preliminary Results show that STS information can improve over
the baseline
Baseline Baseline + STS
MAE 0.639 0.575
Hanna B´echara June 27, 2015 19 / 21
Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
Hanna B´echara June 27, 2015 20 / 21
Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
We can learn the quality of a Sentence B if we have a semantically
similar sentence A with a determined quality
Hanna B´echara June 27, 2015 20 / 21
Methodology
Summing up...
Results show that semantically motivated features can improve over
the quality estimation baseline
We can learn the quality of a Sentence B if we have a semantically
similar sentence A with a determined quality
However, we require access to semantically similar sentences
Hanna B´echara June 27, 2015 20 / 21
The End (For Now)
Enjoy the Weekend!
Hanna B´echara June 27, 2015 21 / 21

More Related Content

More from RIILP

Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT RIILP
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARRIILP
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU RIILP
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMARIILP
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW RIILP
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU RIILP
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARRIILP
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - AcclaroRIILP
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015RIILP
 
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015RIILP
 
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015RIILP
 
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015RIILP
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015RIILP
 

More from RIILP (20)

Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAAR
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMA
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAAR
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - Acclaro
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
 
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
 
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
 
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
 

Recently uploaded

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015

  • 1. Semantic Similarity Measures in Machine Translation Evaluation Hanna B´echara ESR12 Expert Project June 27, 2015 Hanna B´echara June 27, 2015 1 / 21
  • 2. Machine Translation Evaluation How do we define translation quality? Hanna B´echara June 27, 2015 2 / 21
  • 3. Machine Translation Evaluation How do we define translation quality? Fluency? Grammaticality? Readability? Hanna B´echara June 27, 2015 2 / 21
  • 4. Machine Translation Evaluation How do we define translation quality? Fluency? Grammaticality? Readability? Post-editing effort? Hanna B´echara June 27, 2015 2 / 21
  • 5. Machine Translation Evaluation How do we define translation quality? Fluency? Grammaticality? Readability? Post-editing effort? How well it matches a reference translation? Hanna B´echara June 27, 2015 2 / 21
  • 6. Machine Translation Evaluation How do we define translation quality? Fluency? Grammaticality? Readability? Post-editing effort? How well it matches a reference translation? Meaning Preservation? Hanna B´echara June 27, 2015 2 / 21
  • 7. Machine Translation Evaluation How do we define translation quality? Fluency? Grammaticality? Readability? Post-editing effort? How well it matches a reference translation? Meaning Preservation!! Hanna B´echara June 27, 2015 3 / 21
  • 8. Semantic Textual Similarity STS Explained Semantic Textual Similarity (STS) captures the notion that some texts are more similar than others Hanna B´echara June 27, 2015 4 / 21
  • 9. Semantic Textual Similarity STS Explained Semantic Textual Similarity (STS) captures the notion that some texts are more similar than others 5 The two sentences are completely equivalent, as they mean the same thing. 4 The two sentences are mostly equivalent, but some unimportant details differ. 3 The two sentences are roughly equivalent, but some important information differs/missing. 2 The two sentences are not equivalent, but share some details. 1 The two sentences are not equivalent, but are on the same topic. 0 The two sentences are on different topics. Hanna B´echara June 27, 2015 4 / 21
  • 10. Semantic Textual Similarity Examples Example 1 Sentence 1: A brown dog is attacking another animal in front of the man in pants Sentence 2:Two dogs are fighting Hanna B´echara June 27, 2015 5 / 21
  • 11. Semantic Textual Similarity Examples Example 1 Sentence 1: A brown dog is attacking another animal in front of the man in pants Sentence 2:Two dogs are fighting Example 2 Sentence 1: A man is chopping butter into a container. Sentence 2: A woman is cutting shrimps. Hanna B´echara June 27, 2015 5 / 21
  • 12. Semantic Textual Similarity Examples Example 1 Sentence 1: A brown dog is attacking another animal in front of the man in pants Sentence 2:Two dogs are fighting Example 2 Sentence 1: A man is chopping butter into a container. Sentence 2: A woman is cutting shrimps. Example 3 Sentence 1: A cat is playing with a watermelon on a floor. Sentence 2: A man is pouring oil into a pan. Hanna B´echara June 27, 2015 5 / 21
  • 13. Semantic Textual Similarity How do we estimate STS? Crowd-Sourced Similarity Ratings Created for SemEval Workshops Expert’s SemEval Submission SVM Regressor Estimates score between 0 and 5 Train on human annotated sentences provided by the SemEval Shared Tasks Trained on a variety of features Hanna B´echara June 27, 2015 6 / 21
  • 14. Methodology Research Question Can we estimate the score X as a function of R (relatedness) and bA (Quality of A)? Hanna B´echara June 27, 2015 7 / 21
  • 15. Methodology DGT Translation Memory DGT-Translation Memory (EN-FR) 500 sentences x 5 most similar matches Evaluation: S-BLEU (0–1) – Reference French Translations Hanna B´echara June 27, 2015 8 / 21
  • 16. Methodology Machine Learning Task Features 1 Baseline Experiments: 17 QuEst features 2 STS score for source sentence pair 3 S-BLEU score for Sentence Pair A 4 S-BLEU score comparing A to B (MT outputs) Hanna B´echara June 27, 2015 9 / 21
  • 17. Methodology Machine Learning Task Features 1 Baseline Experiments: 17 QuEst features 2 STS score for source sentence pair 3 S-BLEU score for Sentence Pair A 4 S-BLEU score comparing A to B (MT outputs) SVM Regression Model Predicts a score between 0–1 2000 sentences for training – 500 sentences for testing Hanna B´echara June 27, 2015 9 / 21
  • 18. Methodology Results Mean Baseline QuEst Baseline (17) STS (3) Combined (20) MAE 0.16 0.12 0.108 0.09 Table: Predicting the BLEU scores for DGT-TM - Mean Absolute Error Hanna B´echara June 27, 2015 10 / 21
  • 19. Methodology SICK SICK (Sentences Involving Compositional Knowledge ) 4500 sentence pairs Evaluation: S-BLEU Backtranslations Hanna B´echara June 27, 2015 11 / 21
  • 20. Methodology Machine Learning Task Features 1 STS score for source sentence pair 2 S-BLEU score for Sentence A 3 S-BLEU score comparing A to B (MT outputs) Hanna B´echara June 27, 2015 12 / 21
  • 21. Methodology Machine Learning Task Features 1 STS score for source sentence pair 2 S-BLEU score for Sentence A 3 S-BLEU score comparing A to B (MT outputs) SVM Regression Model Predicts a score between 0–1 4000 sentences for training – 500 sentences for testing Hanna B´echara June 27, 2015 12 / 21
  • 22. Methodology Results Mean Baseline STS (3) MAE 0.216 0.193 Table: Predicting the S-BLEU scores for SICK’s Backtranslations - Mean Absolute Error Hanna B´echara June 27, 2015 13 / 21
  • 23. Methodology Designing our Own Objective Create a dataset of semantically related sentences their machine translations, and their quality. Hanna B´echara June 27, 2015 14 / 21
  • 24. Methodology Data Preparation Extracted sentences from the FLICKR images dataset used for previous SemEval tasks Each pair has a human similarity rating between 0-5 Each sentence has a French machine translation and quality score for each translation, between 1 and 5, assigned through manual evaluation Each French sentence pair produced by the machine translation is also assigned a similarity rating through manual evaluation. Hanna B´echara June 27, 2015 15 / 21
  • 25. Methodology Example Sentence A A group of kids is playing in a yard and an old man is standing in the background Sentence B A group of boys in a yard is playing and a man is standing in the background Semantic Similarity between A and B: 4.5 Sentence A - MT Output Un groupe d’enfants joue dans une cour et un vieil homme est debout dans l’arri`ere-plan Sentence B - MT Output Un groupe de gar¸cons dans une cour joue et un homme est debout dans l’arri`ere-plan Semantic Similarity between A - MT Output and B - MT Output: ? Hanna B´echara June 27, 2015 16 / 21
  • 26. Methodology Example Sentence A eurozone unemployment at record 12 percent Sentence B eurozone unemployment hits record 12.1 % in march Semantic Similarity between A and B: 4.5 Sentence A - MT Output lors de la zone euro 12 % de chˆomage record Sentence B - MT Output le chˆomage frappe 12.1 % de la zone euro en marche proc`es-verbalan Semantic Similarity between A - MT Output and B - MT Output: ? Hanna B´echara June 27, 2015 17 / 21
  • 27. Methodology Experiments Features Sets 1 Baseline Experiments: 17 QuEst features 2 STS score for sentence pair 3 Human evaluation score for Pair B (MT Output) 4 S-BLEU score comparing Pair A to Pair B (MT outputs) Hanna B´echara June 27, 2015 18 / 21
  • 28. Methodology Experiments Features Sets 1 Baseline Experiments: 17 QuEst features 2 STS score for sentence pair 3 Human evaluation score for Pair B (MT Output) 4 S-BLEU score comparing Pair A to Pair B (MT outputs) SVM Regression Model 800 sentences for training – 200 sentences for testing Predicts a score between 1–5 Hanna B´echara June 27, 2015 18 / 21
  • 29. Methodology Results Preliminary Results show that STS information can improve over the baseline Baseline Baseline + STS MAE 0.639 0.575 Hanna B´echara June 27, 2015 19 / 21
  • 30. Methodology Summing up... Results show that semantically motivated features can improve over the quality estimation baseline Hanna B´echara June 27, 2015 20 / 21
  • 31. Methodology Summing up... Results show that semantically motivated features can improve over the quality estimation baseline We can learn the quality of a Sentence B if we have a semantically similar sentence A with a determined quality Hanna B´echara June 27, 2015 20 / 21
  • 32. Methodology Summing up... Results show that semantically motivated features can improve over the quality estimation baseline We can learn the quality of a Sentence B if we have a semantically similar sentence A with a determined quality However, we require access to semantically similar sentences Hanna B´echara June 27, 2015 20 / 21
  • 33. The End (For Now) Enjoy the Weekend! Hanna B´echara June 27, 2015 21 / 21