SlideShare uma empresa Scribd logo
1 de 42
Presentation on Internship Work
Speech and Eye Tracking Enabled Computer Assisted Translation
(SEECAT)
Copenhagen Business School
By: Himanshu Bansal
BACKGROUND
Michael Carl
Associate Professor
CBS
Srinivas Bangalore
Principal Member
AT&T Research Labs
BACKGROUND
Why SEECAT ?
BACKGROUND
We need translation
To convey our thoughts foreign language speaker
To understand foreign language text and speech
-------------------------------------------------
Training Data for automated system
To prepare high quality manuscripts of same text in different language
BACKGROUND
ProZ
Tomedes
Verbalizeit
gengo
Straker Translations
BACKGROUND
Translog – Manual Translation
BACKGROUND
CASMACAT – Computer Assisted Translation
BACKGROUND
SEECAT as an extension of CASMACAT
Translator reads a source text on a computer screen and speaks out the translation
in the target language, a process called sight translation. This sight translation
process is supported by an Automatic Speech Recognition (ASR) and a Machine
Translation (MT) system, which transcribe the spoken speech signal into the target
text and which assist the translator with partial translation proposals, predictions
and completions on the computer monitor. An eye-tracking device follows the
translators gaze path on the screen, detects where he or she faces translation
problems and triggers reactive assistance.
STUCTURE of INTERSHIP
21 May- 7 Jun
Lectures and hands on sessions (CBS)
8 Jun- 28 Jun
Divided into teams, worked at summer house (Nykobing, Falster)
29 Jun- 21 July
Integration (CBS)
# Excursions planned for every weekend
GAZE TEAM
Himanshu, Kritika and Rucha
Part -1
Word- Fixation Remapping
Part-2
Mutual disambiguation between gaze and speech
Word- Fixation Remapping
Word- fixation mapping is useful for cognitive/linguistic research, usability studies
and most importantly for providing interactivity into the system
Word- Fixation Remapping
Issues
Identification of the Fixations in a stream of gaze samples.
Mapping the Fixations to words/characters (Dealing with variable error)
Evaluation scheme for the fixation mapping
Word- Fixation Remapping
Our Approach
Word- Fixation Remapping
Our Approach
Word- Fixation Remapping
Our Approach
Word- Fixation Remapping
Our Approach
Word- Fixation Remapping
Our Approach -> Evaluation
● Input:
– Manually annotated fixation to word mapping (Gold Standard)
– Machine computed fixation to word mapping
● Output:
– The average character/word error.
● Method:
– Compute the overlaps in the gaze fixation durations in the manual and
machine annotations.
– For the overlapping fixations, compute the absolute differences in the
cursor positions of the two mappings.
Mutual disambiguation between gaze and speech
Motivation
Mutual disambiguation between gaze and speech
Motivation
Mutual disambiguation between gaze and speech
Motivation
Ambiguity in gaze
Variable Error
-Midas Touch
System Errors
- Eye Tracker
- Algorithm
- Calibration
Ambiguity in ASR
Domain of training data
Accent of speaker
Morphology of language
Speaking Style
-Co-articulation
Mutual disambiguation between gaze and speech
Motivation
Consider a simple example:
● User reads the text “where is the bat”
● ASR can help map gaze points to
● Gaze can help disambiguate ASR output
Where is the bat.
There it is, behind the door
I can't find it! Where is it?
Look properly! Its right there.
Here is the mat
Here is the bat
where is the bat
where is a pat
ASR
Hypothesis
where
it there
theis Possible words
being gazed Intersection
Mutual disambiguation between gaze and speech
Inspiration from literature research
Meyer et.al. studied eye movements in an object naming task. It was shown that
people consistently fixated objects prior to naming them.
Griffin showed that when multiple objects were being named in a single
utterance, Speech about one object was being produced while the next object
was fixated and lexically processed.
Mutual disambiguation between gaze and speech
Experiments -> Reading Task
• 5 participants read English Text
• Eye Gaze and Speech Recorded
• 6 sets of 11 sentences
• 5 sets in domain and 1 out of domain
Mutual disambiguation between gaze and speech
Experiments -> Translation Task
• 4 participants translated English Text
• 4 sets of 10-10 very simple sentences
• Target languages - Hi, Sp, Da, It
• Eye gaze on source language words and speech in target languages recorded
Mutual disambiguation between gaze and speech
Approach
ASR word lattice
Reference sentence:
Leaving next day in the
morning
Mutual disambiguation between gaze and speech
Approach
Gaze word lattice
Mutual disambiguation between gaze and speech
Approach
Gaze bag of words
Mutual disambiguation between gaze and speech
Approach
Composed word-lattice
Reference sentence: Leaving next day in the morning
Mutual disambiguation between gaze and speech
System
• Performed experiments on Translog
• Speech hypothesis are provided by AT&T Watson server
• Transformed these format to word-lattice format using python
• Generate bag of words from x,y coordinates using our algo of part 1 using c#
and python
• In case of translation tasks, gaze bag of words are transformed into target
language bag of words using lexicons (1 more level of ambiguity)
• Composed these lattices using OpenFST
Mutual disambiguation between gaze and speech
System -> experiment with algo
• Weights of gaze words : should consider or not
• Weights of ASR words: should consider or not
Then used Latin square ->
• Unweighted ASR with Weighted Gaze Bag-of-words (WGUA)
• Unweighted ASR with Unweighted Gaze Bag-of-words (UGUA)
• Weighted ASR with Weighted Gaze Bag-of-words (WGWA)
• Weighted ASR with Unweighted Gaze Bag-of-words (UGWA)
ASR
SCLITE was used to get the word accuracies of then-best hypotheses
with respect to the reference sentence.
Eye Gaze
Precision: ((Wg) inters (Wr))/Wg
Recall = ((Wg) inters (Wr))/Wr
F-Measure (Harmonic mean of precision and recall)
Sentence Recognition Error (SRI; 1 or 0 )
Wg =Unique words in the gazed words
Wr =Unique words in the reference sentence
Mutual disambiguation between gaze and speech
Evaluation
Mutual disambiguation between gaze and speech
Research Design
Reading Task
Independent Variables
Domain of test data
Weights of ASR
Weights of Eye Gaze
Dependent Variable
Gaze f-measure
Gaze SRI
ASR Word accuracy
Translation Task
Independent Variables
Target language
Weights of ASR
Weights of Eye Gaze
Dependent Variable
Gaze f-measure
Gaze SRI
ASR Word accuracy
Mutual disambiguation between gaze and speech
Reading– Paired T-test
In domain Out of Domain All
Gaze_Precision 8.63075E-07 no improvement 9.2475E-07
Gaze_Recall 0.048194288 no improvement 0.048220416
Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06
ASR_WrdAccr 0.040033133 0.86206786 0.067110316
In domain Out of Domain All
Gaze_Precision 8.63075E-07 no improvement 9.2475E-07
Gaze_Recall 0.048194288 no improvement 0.048220416
Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06
ASR_WrdAccr 0.007594247 0.86206786 0.017268861
In domain Out of Domain All
Gaze_Precision 8.63075E-07 no improvement 9.2475E-07
Gaze_Recall 0.048194288 no improvement 0.048220416
Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06
ASR_WrdAccr 0.040033133 0.86206786 0.067110316
In domain Out of Domain All
Gaze_Precision 8.63075E-07 no improvement 9.2475E-07
Gaze_Recall 0.048194288 no improvement 0.048220416
Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06
ASR_WrdAccr 0.040033133 0.363217468 0.099456245
WGUAUGWAUGUAWGWA
Mutual disambiguation between gaze and speech
Reading– Absolute % improvements
Mutual disambiguation between gaze and speech
Translation – Paired T-test
En-Hi En-Sp En-It En-Da
Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612
ASR_Word_Accuracy 0.001722134 0.002676057 0.108137263
En-Hi En-Sp En-It En-Da
Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612
ASR_Word_Accuracy 0.702333466 0.323474945 0.108137263
En-Hi En-Sp En-It En-Da
Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612
ASR_Word_Accuracy 0.003101235 0.011298332 0.209938743
En-Hi En-Sp En-It En-Da
Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591
Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10
Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612
ASR_Word_Accuracy 0.045589916 0.181222117 0.108137263
UGUAWGUAWGWAUGWA
Mutual disambiguation between gaze and speech
Translation – Absolute % improvements
Mutual disambiguation between gaze and speech
Conclusions
Reading Task
• Significant improvement in both Gaze F-measure and ASR accuracy after
integration
• Gaze recall fall significantly
• SRI also improved
• Improvement in In-domain task was lot more than out-of-domain task
• Out of the four experiments UGWA is observed best
Mutual disambiguation between gaze and speech
Conclusions
Translation Task
• Significant improvement in Gaze F-measure only for all languages
• ASR accuracy improved non-significantly
• For Hindi and Danish SRI decreased a lot
• Again UGWA is observed to be best (For 3 languages)
Mutual disambiguation between gaze and speech
Overview flowchart
Input from gaze
Fixation-word
remapping algo
Got x, y from already
logged files EVALUATION: fixation duration
intersection b/w machine and manual
3 manual and 1 machine
Static text reading
experiment
Eye gaze data
captured (translog)
Audio recorded at
sentence level
Word-
lattices
Bag of
Words
EVALUATION: comparison with BoW of
reference sentence: precision and recall
10 best
hypothesis
Watson
server
EVALUATION: compared 1st best with
reference text (edit distance) - ScLite
Word
lattices
Eye gaze
disambiguation
ASR
disambiguation
With weighted & un-
weighted ASR lattices
Improved
BoW
Improved
Hypothesis
EVAL
EVAL
Majority
based also
Sentence
Identfi.
LEARNING
Academic
Worked with Tobii T-60
Experiment Design
Python
Latex
Moses
Translog
Putty
Cygwin
Audacity
OpenFST
Got an idea of MT and ASR
Personal
Communication Skills
Project-Management
morning reporting
presentations
weekly targets and check
Kayaking
Two string kite
Bit of Cooking
Some photos
Thanks
Hoping the monkeys to be friends forever

Mais conteúdo relacionado

Destaque

Advertising Tips for Your Publication
Advertising Tips for Your PublicationAdvertising Tips for Your Publication
Advertising Tips for Your Publication
marinabooh
 
資格試験について - JAWS FESTA Kansai 2013 LT
資格試験について - JAWS FESTA Kansai 2013 LT資格試験について - JAWS FESTA Kansai 2013 LT
資格試験について - JAWS FESTA Kansai 2013 LT
Eikichi Gotoh
 
Geluidsisolatie versie 1 nvm som (1)
Geluidsisolatie versie 1 nvm    som (1)Geluidsisolatie versie 1 nvm    som (1)
Geluidsisolatie versie 1 nvm som (1)
Quietroom Label
 
Mba724 s4 4 questionnaire design
Mba724 s4 4 questionnaire designMba724 s4 4 questionnaire design
Mba724 s4 4 questionnaire design
Rachel Chung
 

Destaque (16)

Postero Group
Postero GroupPostero Group
Postero Group
 
African Farmer game walkthrough
African Farmer game walkthroughAfrican Farmer game walkthrough
African Farmer game walkthrough
 
Advertising Tips for Your Publication
Advertising Tips for Your PublicationAdvertising Tips for Your Publication
Advertising Tips for Your Publication
 
Mysql5.1 character set testing
Mysql5.1 character set testingMysql5.1 character set testing
Mysql5.1 character set testing
 
Сезон простуд вебинар2016
Сезон простуд вебинар2016Сезон простуд вебинар2016
Сезон простуд вебинар2016
 
Despacho de Moro sobre Lula
Despacho de Moro sobre LulaDespacho de Moro sobre Lula
Despacho de Moro sobre Lula
 
Native americans
Native americansNative americans
Native americans
 
資格試験について - JAWS FESTA Kansai 2013 LT
資格試験について - JAWS FESTA Kansai 2013 LT資格試験について - JAWS FESTA Kansai 2013 LT
資格試験について - JAWS FESTA Kansai 2013 LT
 
SafePeak - IT particle accelerator (2012)
SafePeak - IT particle accelerator (2012)SafePeak - IT particle accelerator (2012)
SafePeak - IT particle accelerator (2012)
 
Reach.UrFaculty - Govt. Jobs Update Mar 8
Reach.UrFaculty - Govt. Jobs Update Mar 8Reach.UrFaculty - Govt. Jobs Update Mar 8
Reach.UrFaculty - Govt. Jobs Update Mar 8
 
Five Keys to Social License Success
Five Keys to Social License SuccessFive Keys to Social License Success
Five Keys to Social License Success
 
Lampiran sk dirjen pendis nomor 3932 th 2016 madrasah k13 tp 2016 2017
Lampiran sk dirjen pendis nomor 3932 th 2016 madrasah k13 tp 2016 2017Lampiran sk dirjen pendis nomor 3932 th 2016 madrasah k13 tp 2016 2017
Lampiran sk dirjen pendis nomor 3932 th 2016 madrasah k13 tp 2016 2017
 
Geluidsisolatie versie 1 nvm som (1)
Geluidsisolatie versie 1 nvm    som (1)Geluidsisolatie versie 1 nvm    som (1)
Geluidsisolatie versie 1 nvm som (1)
 
Reach.UrFaculty - Govt. Jobs Update Mar 7
Reach.UrFaculty - Govt. Jobs Update Mar 7Reach.UrFaculty - Govt. Jobs Update Mar 7
Reach.UrFaculty - Govt. Jobs Update Mar 7
 
The chorus
The chorusThe chorus
The chorus
 
Mba724 s4 4 questionnaire design
Mba724 s4 4 questionnaire designMba724 s4 4 questionnaire design
Mba724 s4 4 questionnaire design
 

Semelhante a Intern presentation

ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
butest
 

Semelhante a Intern presentation (20)

Part of speech tagger English - By sadak pramodh
Part of speech tagger   English - By sadak pramodhPart of speech tagger   English - By sadak pramodh
Part of speech tagger English - By sadak pramodh
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
Part-of-Speech Tagging for Bengali Thesis submitted to Indian ...
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Sslis
SslisSslis
Sslis
 
Jq3616701679
Jq3616701679Jq3616701679
Jq3616701679
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 

Mais de Himanshu Bansal (16)

Studies in application of Augmented Reality in E-Learning Courses
Studies in application of Augmented Reality in E-Learning CoursesStudies in application of Augmented Reality in E-Learning Courses
Studies in application of Augmented Reality in E-Learning Courses
 
Human senses: Making sense of a new language
Human senses: Making sense of a new languageHuman senses: Making sense of a new language
Human senses: Making sense of a new language
 
Speech enhanced gesture based navigation for Google Maps
Speech enhanced gesture based navigation for Google MapsSpeech enhanced gesture based navigation for Google Maps
Speech enhanced gesture based navigation for Google Maps
 
Textual and visual analysis of print advertisements
Textual and visual analysis of print advertisementsTextual and visual analysis of print advertisements
Textual and visual analysis of print advertisements
 
Media as mirror vs. prosthesis
Media as mirror vs. prosthesisMedia as mirror vs. prosthesis
Media as mirror vs. prosthesis
 
Shopping Mall Entrance Design
Shopping Mall Entrance DesignShopping Mall Entrance Design
Shopping Mall Entrance Design
 
Piet Mondrian
Piet MondrianPiet Mondrian
Piet Mondrian
 
Sensitive Windows Explorer
Sensitive Windows ExplorerSensitive Windows Explorer
Sensitive Windows Explorer
 
Design of shopping mall entrance
Design of shopping mall entranceDesign of shopping mall entrance
Design of shopping mall entrance
 
IIT Delhi Branding
IIT Delhi BrandingIIT Delhi Branding
IIT Delhi Branding
 
Traplate
TraplateTraplate
Traplate
 
Matrix Magazine' 12- Anantha
Matrix Magazine' 12- AnanthaMatrix Magazine' 12- Anantha
Matrix Magazine' 12- Anantha
 
Presentation1
Presentation1Presentation1
Presentation1
 
chair_10020516
chair_10020516chair_10020516
chair_10020516
 
brick_10020516
brick_10020516brick_10020516
brick_10020516
 
matrix magazine pages
matrix magazine pagesmatrix magazine pages
matrix magazine pages
 

Último

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 

Intern presentation

  • 1. Presentation on Internship Work Speech and Eye Tracking Enabled Computer Assisted Translation (SEECAT) Copenhagen Business School By: Himanshu Bansal
  • 2. BACKGROUND Michael Carl Associate Professor CBS Srinivas Bangalore Principal Member AT&T Research Labs
  • 4. BACKGROUND We need translation To convey our thoughts foreign language speaker To understand foreign language text and speech ------------------------------------------------- Training Data for automated system To prepare high quality manuscripts of same text in different language
  • 7. BACKGROUND CASMACAT – Computer Assisted Translation
  • 8. BACKGROUND SEECAT as an extension of CASMACAT Translator reads a source text on a computer screen and speaks out the translation in the target language, a process called sight translation. This sight translation process is supported by an Automatic Speech Recognition (ASR) and a Machine Translation (MT) system, which transcribe the spoken speech signal into the target text and which assist the translator with partial translation proposals, predictions and completions on the computer monitor. An eye-tracking device follows the translators gaze path on the screen, detects where he or she faces translation problems and triggers reactive assistance.
  • 9. STUCTURE of INTERSHIP 21 May- 7 Jun Lectures and hands on sessions (CBS) 8 Jun- 28 Jun Divided into teams, worked at summer house (Nykobing, Falster) 29 Jun- 21 July Integration (CBS) # Excursions planned for every weekend
  • 10. GAZE TEAM Himanshu, Kritika and Rucha Part -1 Word- Fixation Remapping Part-2 Mutual disambiguation between gaze and speech
  • 11. Word- Fixation Remapping Word- fixation mapping is useful for cognitive/linguistic research, usability studies and most importantly for providing interactivity into the system
  • 12. Word- Fixation Remapping Issues Identification of the Fixations in a stream of gaze samples. Mapping the Fixations to words/characters (Dealing with variable error) Evaluation scheme for the fixation mapping
  • 17. Word- Fixation Remapping Our Approach -> Evaluation ● Input: – Manually annotated fixation to word mapping (Gold Standard) – Machine computed fixation to word mapping ● Output: – The average character/word error. ● Method: – Compute the overlaps in the gaze fixation durations in the manual and machine annotations. – For the overlapping fixations, compute the absolute differences in the cursor positions of the two mappings.
  • 18. Mutual disambiguation between gaze and speech Motivation
  • 19. Mutual disambiguation between gaze and speech Motivation
  • 20. Mutual disambiguation between gaze and speech Motivation Ambiguity in gaze Variable Error -Midas Touch System Errors - Eye Tracker - Algorithm - Calibration Ambiguity in ASR Domain of training data Accent of speaker Morphology of language Speaking Style -Co-articulation
  • 21. Mutual disambiguation between gaze and speech Motivation Consider a simple example: ● User reads the text “where is the bat” ● ASR can help map gaze points to ● Gaze can help disambiguate ASR output Where is the bat. There it is, behind the door I can't find it! Where is it? Look properly! Its right there. Here is the mat Here is the bat where is the bat where is a pat ASR Hypothesis where it there theis Possible words being gazed Intersection
  • 22. Mutual disambiguation between gaze and speech Inspiration from literature research Meyer et.al. studied eye movements in an object naming task. It was shown that people consistently fixated objects prior to naming them. Griffin showed that when multiple objects were being named in a single utterance, Speech about one object was being produced while the next object was fixated and lexically processed.
  • 23. Mutual disambiguation between gaze and speech Experiments -> Reading Task • 5 participants read English Text • Eye Gaze and Speech Recorded • 6 sets of 11 sentences • 5 sets in domain and 1 out of domain
  • 24. Mutual disambiguation between gaze and speech Experiments -> Translation Task • 4 participants translated English Text • 4 sets of 10-10 very simple sentences • Target languages - Hi, Sp, Da, It • Eye gaze on source language words and speech in target languages recorded
  • 25. Mutual disambiguation between gaze and speech Approach ASR word lattice Reference sentence: Leaving next day in the morning
  • 26. Mutual disambiguation between gaze and speech Approach Gaze word lattice
  • 27. Mutual disambiguation between gaze and speech Approach Gaze bag of words
  • 28. Mutual disambiguation between gaze and speech Approach Composed word-lattice Reference sentence: Leaving next day in the morning
  • 29. Mutual disambiguation between gaze and speech System • Performed experiments on Translog • Speech hypothesis are provided by AT&T Watson server • Transformed these format to word-lattice format using python • Generate bag of words from x,y coordinates using our algo of part 1 using c# and python • In case of translation tasks, gaze bag of words are transformed into target language bag of words using lexicons (1 more level of ambiguity) • Composed these lattices using OpenFST
  • 30. Mutual disambiguation between gaze and speech System -> experiment with algo • Weights of gaze words : should consider or not • Weights of ASR words: should consider or not Then used Latin square -> • Unweighted ASR with Weighted Gaze Bag-of-words (WGUA) • Unweighted ASR with Unweighted Gaze Bag-of-words (UGUA) • Weighted ASR with Weighted Gaze Bag-of-words (WGWA) • Weighted ASR with Unweighted Gaze Bag-of-words (UGWA)
  • 31. ASR SCLITE was used to get the word accuracies of then-best hypotheses with respect to the reference sentence. Eye Gaze Precision: ((Wg) inters (Wr))/Wg Recall = ((Wg) inters (Wr))/Wr F-Measure (Harmonic mean of precision and recall) Sentence Recognition Error (SRI; 1 or 0 ) Wg =Unique words in the gazed words Wr =Unique words in the reference sentence Mutual disambiguation between gaze and speech Evaluation
  • 32. Mutual disambiguation between gaze and speech Research Design Reading Task Independent Variables Domain of test data Weights of ASR Weights of Eye Gaze Dependent Variable Gaze f-measure Gaze SRI ASR Word accuracy Translation Task Independent Variables Target language Weights of ASR Weights of Eye Gaze Dependent Variable Gaze f-measure Gaze SRI ASR Word accuracy
  • 33. Mutual disambiguation between gaze and speech Reading– Paired T-test In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.86206786 0.067110316 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.007594247 0.86206786 0.017268861 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.86206786 0.067110316 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.363217468 0.099456245 WGUAUGWAUGUAWGWA
  • 34. Mutual disambiguation between gaze and speech Reading– Absolute % improvements
  • 35. Mutual disambiguation between gaze and speech Translation – Paired T-test En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.001722134 0.002676057 0.108137263 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.702333466 0.323474945 0.108137263 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.003101235 0.011298332 0.209938743 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.045589916 0.181222117 0.108137263 UGUAWGUAWGWAUGWA
  • 36. Mutual disambiguation between gaze and speech Translation – Absolute % improvements
  • 37. Mutual disambiguation between gaze and speech Conclusions Reading Task • Significant improvement in both Gaze F-measure and ASR accuracy after integration • Gaze recall fall significantly • SRI also improved • Improvement in In-domain task was lot more than out-of-domain task • Out of the four experiments UGWA is observed best
  • 38. Mutual disambiguation between gaze and speech Conclusions Translation Task • Significant improvement in Gaze F-measure only for all languages • ASR accuracy improved non-significantly • For Hindi and Danish SRI decreased a lot • Again UGWA is observed to be best (For 3 languages)
  • 39. Mutual disambiguation between gaze and speech Overview flowchart Input from gaze Fixation-word remapping algo Got x, y from already logged files EVALUATION: fixation duration intersection b/w machine and manual 3 manual and 1 machine Static text reading experiment Eye gaze data captured (translog) Audio recorded at sentence level Word- lattices Bag of Words EVALUATION: comparison with BoW of reference sentence: precision and recall 10 best hypothesis Watson server EVALUATION: compared 1st best with reference text (edit distance) - ScLite Word lattices Eye gaze disambiguation ASR disambiguation With weighted & un- weighted ASR lattices Improved BoW Improved Hypothesis EVAL EVAL Majority based also Sentence Identfi.
  • 40. LEARNING Academic Worked with Tobii T-60 Experiment Design Python Latex Moses Translog Putty Cygwin Audacity OpenFST Got an idea of MT and ASR Personal Communication Skills Project-Management morning reporting presentations weekly targets and check Kayaking Two string kite Bit of Cooking
  • 42. Thanks Hoping the monkeys to be friends forever