SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Exploring Automatic Speech
Recognition with Tensorflow
Janna Escur Xavier Giró Marta Ruiz
1
February 5th, 2018
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
2
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
3
Introduction: motivation
4
Introduction: motivation
5
Introduction: problem definition
Automatic Speech
Recognition (ASR)
system
“Good morning”
6
Introduction: difficulties
- Variability: child, male, women…
- Circumstances: same speaker different speech signal
- Not native speaker
- Noisy conditions
- More than one speaker
- Crossing conversations
- ...
7
Introduction: context
Speech2Signs: Spoken to Sign Language Translation using Neural Networks
8
Introduction: context
Speech2Signs: Spoken to Sign Language Translation using Neural Networks
9
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
10
Related work: Automatic Speech Recognition
history
- NO Deep Learning: Gaussian Mixture Models (GMM) - Hidden Markov Model (HMM)
- Deep Neural Networks (DNN) - HMM
- DNN without HMM: Connectionist Temporal Classification
- End-to-end trained
11
Related work: classic ASR system
12
Related work:
Gaussian Mixture Model (GMM) -
Hidden Markov Model (HMM)
13
Gales, Mark, and Steve Young. "The application of hidden Markov models in speech recognition." Foundations and Trends® in Signal Processing 1.3
(2008): 195-304.
Related work:
Deep Neural Networks - HMM
Recurrent Neural Networks - HMM
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature
521.7553 (2015): 436.
14
Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal
Processing Magazine 29.6 (2012): 82-97.
Related work: Connectionist Temporal
Classification
- Does not impose frame by frame synchronization between
input and output
- The system can choose the position of the output
characters
15
Hannun, Awni. "Sequence Modeling with CTC." Distill 2.11 (2017): e8.
https://distill.pub/2017/ctc/
Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 2006
Related work: Connectionist Temporal
Classification
The Recurrent Neural Network (RNN) estimates the per
time-step probabilities
16
Hannun, Awni. "Sequence Modeling with CTC." Distill 2.11 (2017): e8.
https://distill.pub/2017/ctc/
Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 2006
Related work: end-to-end ASR
Acoustic
model
Pronunciation
dictionary
Language
model
DNN
Big models & lots of data!
17
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
18
Methodology: Listen, Attend and Spell (LAS)
LAS learns all the components of
a speech recognizer jointly
Sequence to sequence + attention
19
Chan, William, et al. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." Acoustics, Speech and Signal
Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
Methodology: Listen, Attend and Spell (LAS)
Listener (encoder)
20
Pyramidal BLSTM
Methodology: Listen, Attend and Spell (LAS)
21
Speller (decoder)
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
22
Contribution
LAS implementation by Vincent Renkens (Phd at KULeuven): Nabu under development
Github code: https://github.com/vrenkens/nabu
23
Contribution
TOTAL of 10 issues.
We closed all of them!
24
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
25
Experiments: TIMIT database
- Complete corpus: 6300 sentences, 10 sentences spoken by each of 630 speakers
- Train: 462 speakers
- Complete test set: 168 speakers, 8 sentences each.
- Core test set: 24 speakers, 8 sentences each.
- Validation set: 50 speakers, 8 sentences each.
26
Experiments: TIMIT database
- Complete corpus: 6300 sentences, 10 sentences spoken by each of 630 speakers
- Train: 462 speakers
- Complete test set: 168 speakers, 8 sentences each.
- Core test set: 24 speakers, 8 sentences each.
- Validation set: 50 speakers, 8 sentences each.
27
Evaluation metric
Phoneme Error Rate (PER)
PER =
28
S + D + I
N
Where,
S: number of substitutions
D: number of deletions
I: number of insertions
N: number of phonemes in the reference sentence.
Experiments: [1] NABU
Encoder
- Layers 2 + 1 non-pyramidal while training
- Units/layer 128 per direction
Decoder
- Layers 1
- Units/layer 128
PER = 31,94%
29
Steps
Steps
Validation loss
Training loss
Experiments: [2] LAS
Encoder
- Layers 3
- Units/layer 256 per direction
Decoder
- Layers 2
- Units/layer 512
PER = 27.29%
30
Validation loss
Training loss
Steps
Steps
Outline
- Introduction
- Related work
- Methodology
- Contribution
- Experiments
- Conclusions and future work
31
Conclusions and future work
- We have reached a better understanding of the techniques used to process speech in the
deep learning framework.
- We have faced with an under-development implementation
32
Conclusions and future work
- We finally have trained the first end-to-end model with sequence to sequence learning
improved with the attention mechanism in the research area of TALP.
33
- We tried it with different configurations in order to obtain the lowest Phoneme Error Rate.
- The system is ready to be used as the first module of Speech2Signs or as a baseline in further
research projects.
34Speech2Signs project

Mais conteúdo relacionado

Mais de Universitat Politècnica de Catalunya

Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

Mais de Universitat Politècnica de Catalunya (20)

Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

Último

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Último (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Exploring Automatic Speech Recognition with Rensorflow

  • 1. Exploring Automatic Speech Recognition with Tensorflow Janna Escur Xavier Giró Marta Ruiz 1 February 5th, 2018
  • 2. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 2
  • 3. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 3
  • 6. Introduction: problem definition Automatic Speech Recognition (ASR) system “Good morning” 6
  • 7. Introduction: difficulties - Variability: child, male, women… - Circumstances: same speaker different speech signal - Not native speaker - Noisy conditions - More than one speaker - Crossing conversations - ... 7
  • 8. Introduction: context Speech2Signs: Spoken to Sign Language Translation using Neural Networks 8
  • 9. Introduction: context Speech2Signs: Spoken to Sign Language Translation using Neural Networks 9
  • 10. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 10
  • 11. Related work: Automatic Speech Recognition history - NO Deep Learning: Gaussian Mixture Models (GMM) - Hidden Markov Model (HMM) - Deep Neural Networks (DNN) - HMM - DNN without HMM: Connectionist Temporal Classification - End-to-end trained 11
  • 12. Related work: classic ASR system 12
  • 13. Related work: Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM) 13 Gales, Mark, and Steve Young. "The application of hidden Markov models in speech recognition." Foundations and Trends® in Signal Processing 1.3 (2008): 195-304.
  • 14. Related work: Deep Neural Networks - HMM Recurrent Neural Networks - HMM LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436. 14 Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal Processing Magazine 29.6 (2012): 82-97.
  • 15. Related work: Connectionist Temporal Classification - Does not impose frame by frame synchronization between input and output - The system can choose the position of the output characters 15 Hannun, Awni. "Sequence Modeling with CTC." Distill 2.11 (2017): e8. https://distill.pub/2017/ctc/ Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 2006
  • 16. Related work: Connectionist Temporal Classification The Recurrent Neural Network (RNN) estimates the per time-step probabilities 16 Hannun, Awni. "Sequence Modeling with CTC." Distill 2.11 (2017): e8. https://distill.pub/2017/ctc/ Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 2006
  • 17. Related work: end-to-end ASR Acoustic model Pronunciation dictionary Language model DNN Big models & lots of data! 17
  • 18. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 18
  • 19. Methodology: Listen, Attend and Spell (LAS) LAS learns all the components of a speech recognizer jointly Sequence to sequence + attention 19 Chan, William, et al. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
  • 20. Methodology: Listen, Attend and Spell (LAS) Listener (encoder) 20 Pyramidal BLSTM
  • 21. Methodology: Listen, Attend and Spell (LAS) 21 Speller (decoder)
  • 22. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 22
  • 23. Contribution LAS implementation by Vincent Renkens (Phd at KULeuven): Nabu under development Github code: https://github.com/vrenkens/nabu 23
  • 24. Contribution TOTAL of 10 issues. We closed all of them! 24
  • 25. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 25
  • 26. Experiments: TIMIT database - Complete corpus: 6300 sentences, 10 sentences spoken by each of 630 speakers - Train: 462 speakers - Complete test set: 168 speakers, 8 sentences each. - Core test set: 24 speakers, 8 sentences each. - Validation set: 50 speakers, 8 sentences each. 26
  • 27. Experiments: TIMIT database - Complete corpus: 6300 sentences, 10 sentences spoken by each of 630 speakers - Train: 462 speakers - Complete test set: 168 speakers, 8 sentences each. - Core test set: 24 speakers, 8 sentences each. - Validation set: 50 speakers, 8 sentences each. 27
  • 28. Evaluation metric Phoneme Error Rate (PER) PER = 28 S + D + I N Where, S: number of substitutions D: number of deletions I: number of insertions N: number of phonemes in the reference sentence.
  • 29. Experiments: [1] NABU Encoder - Layers 2 + 1 non-pyramidal while training - Units/layer 128 per direction Decoder - Layers 1 - Units/layer 128 PER = 31,94% 29 Steps Steps Validation loss Training loss
  • 30. Experiments: [2] LAS Encoder - Layers 3 - Units/layer 256 per direction Decoder - Layers 2 - Units/layer 512 PER = 27.29% 30 Validation loss Training loss Steps Steps
  • 31. Outline - Introduction - Related work - Methodology - Contribution - Experiments - Conclusions and future work 31
  • 32. Conclusions and future work - We have reached a better understanding of the techniques used to process speech in the deep learning framework. - We have faced with an under-development implementation 32
  • 33. Conclusions and future work - We finally have trained the first end-to-end model with sequence to sequence learning improved with the attention mechanism in the research area of TALP. 33 - We tried it with different configurations in order to obtain the lowest Phoneme Error Rate. - The system is ready to be used as the first module of Speech2Signs or as a baseline in further research projects.