SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Hiroyuki Miyoshi, Yuki Saito,
Shinnosuke Takamichi, and Hiroshi Saruwatari
(The University of Tokyo)
Voice Conversion Using
Sequence-to-Sequence Learning
of Context Posterior Probabilities
INTERSPEECH Tue-O-4-10-1
Stockholm, Sweden
Aug. 22, 2017
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 1/15
Outline of This Talk
Issue:
 Voice conversion needs parallel data of source and target speakers.
Conventional method
 Voice conversion using context posterior probabilities (CPPs). [Sun et al., 2016]
1. Recognition: source speech feats. → source CPPs.
2. Synthesis: copied source CPPs. → target speech feats.
Pros. : Non-parallel voice conversion
Cons. : Difficulty of converting speaker individuality included in CPPs
Proposed:
 Sequence-to-sequence (Seq2Seq) conversion from source CPPs to target
CPPs
 Joint training of recognition and synthesis to increase conversion performance
Results:
 Seq2Seq learning achieved variable-length voice conversion.
 Joint training improved speaker similarity and quality of converted speech.
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 2/15
Conventional Voice Conversion Algorithm:
Copied Context Posterior Probability
[Sun et al., 2016] Training
Target
speech feats.
LSTM
LSTM
Source
speech feats.
a
i
u
Target
CPP
𝒙
𝑹(⋅)
CPP
Context
label
𝒍 𝑥ෝ𝒑 𝒙 ෝ𝒑 𝑦
𝑮(⋅)
𝒚
𝑮(ෝ𝒑 𝑦)
1. Recognition 2. Synthesis
Time
Recognition Error
(Softmax cross entropy)
𝐿 𝐶(ෝ𝒑 𝒙, 𝒍 𝑥)
Synthesis Error
(Mean squared error)
𝐿 𝐺(𝑮(ෝ𝒑 𝑦), 𝒚)
Separated training
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 3/15
Conventional Voice Conversion Algorithm:
Copied Context Posterior Probability
[Sun et al., 2016] Conversion (conventional)
Predicted
speech feats.
LSTM
LSTM
Source
speech feats.
Target
CPP
𝒙
𝑹(⋅)
Source
CPP
𝑮(⋅)
ෝ𝒚
Time
ෝ𝒑 𝒙
𝑮(ෝ𝒑 𝑦)
1. Recognition 2. Synthesis
COPY
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 4/15
Time
Issues of Conventional Voice Conversion
1. CPPs’ shapes and lengths are significantly different betw. speakers.
Shapes are different.
Lengths of each phoneme are different.
2. Improving recognition accuracy ≠ improving synthesis accuracy
Conventional method separately trains speech recognition/synthesis.
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 5/15
Proposed Algorithms
1. Sequence-to-Sequence Conversion from
Source CPP to Target CPP
2. Joint Training of Recognition and Synthesis
(like auto-encoding)
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 6/15
Sequence-to-Sequence Learning [Sutskever et al., 2014]
 Sequence-to-Sequence Learning: variable-length conversion
雨 が 降る
It rainsInput sequence (Encoder)
Output sequence (Decoder)
Japanese-to-English translation using Seq2Seq learning
Constraints
 Phoneme duration is given.
 Conversion is done phoneme by phoneme.
 Problems of Seq2Seq conversion of CPPs
・Determining duration is difficult.
・Conversion failures propagate if the number of frames to be generated is large.
[Weng et al., 2016]
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 7/15
Sequence-to-Sequence Conversion of CPPs
Target
speech feats.
LSTM
LSTM
Source
speech feats.
Target CPP
𝒙
𝑹(⋅)
CPP
ෝ𝒑 𝒙
𝑪(ෝ𝒑 𝒙) 𝑮(⋅)
ෝ𝒚
𝑮(𝑪(ෝ𝒑 𝑥))
1. Recognition 2. Synthesis
Time
 Conversion
Seq2Seq
conversion
𝑪(⋅)
Loss function: 𝑳 𝑮 𝑪(ෝ𝒑 𝒙), ෝ𝒑 𝒚 + 𝑳 𝑪 𝑪 ෝ𝒑 𝒙 , 𝒍 𝒚
Mean squared error Softmax cross entropy
betw. predicted CPPs/target labels
Minimizes conversion error Alleviates recognition error
included in target CPPs.
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 8/15
Effect of the Proposed Algorithm
 Variable-length voice conversion
0
1
Variable-length
conversion of CPPs is achieved!
Source CPP Target CPP
Frame
CPP after Seq2Seq conversion
Time
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 9/15
Joint Training of
Speech Recognition and Synthesis
 Training
Source
speech feats.
LSTM
LSTM
Source
speech feats.
Source CPP
𝒙
𝑹(⋅)
𝒍 𝑥
ෝ𝒑 𝒙
𝑮(⋅)
𝒙
𝑮(ෝ𝒑 𝑥)
1. Recognition 2. Synthesis
Time Joint training
Recognition Error
𝐿 𝐶 𝑹 𝒙 , 𝒍 𝑥 + 𝐿 𝐺(𝑮(ෝ𝒑 𝑥), 𝒙)
(Conventional term) + Synthesis error using predicted CPP
Experimental Evaluations
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 11/15
Experimental Setup
Dataset ATR Japanese speech database
(phonetically balanced 503 sentences)
Training/Test 450 sentences / 53 sentences (16 kHz sampling)
Linguistic feats. 224-dimensional vectors (phonemes)
Speech feats. Mel-cepstrum (1st-through-24th) + Delta
Optimization algorithm AdaGrad (learning rate = 0.01) [Duchi et al., 2011.]
Recognition/ Synthesis Model Bidirectional LSTM (256 units)
Encoder / Decoder Bidirectional LSTM / LSTM (256 units each)
Number of Speakers 8 people including source and target speaker
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 12/15
Objective and Subjective Evaluations of
Seq2Seq Learning
Objective Eval.
Subjective Eval.
Better!
Better!
Worse
Error bars denote
95 % confidence
intervals.
Source Target
Voice samples are
available online.
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 13/15
Objective Evaluation of Joint Training
Better!
Joint Training got better score on mel-cepstral distortion!
Auto-encoding case
Calculates reconstruction error
after recognition and synthesis.
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 14/15
Subjective Evaluation of Joint Training
Better!
Subjective Eval.
Better!
Joint Training made both speaker similarity and speech quality better!
INTERSPEECH 2017 @Stockholm Aug. 22, 2017 15/15
Conclusion
Issue:
 Difficulty of converting speaker individuality included in CPPs.
 Improving recognition accuracy ≠ improving synthesis accuracy.
Proposed:
 Sequence-to-sequence (Seq2Seq) conversion from source CPPs to target
CPPs.
 Joint training of recognition and synthesis.
Results:
 Seq2Seq learning achieved variable-length voice conversion.
 Joint training improved speaker similarity and quality of converted speech.

Mais conteúdo relacionado

Mais procurados

Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...Tomoki Koriyama
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Class9
 Class9 Class9
Class9issbp
 
Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translatesotanemoto
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesMohamed El-Geish
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation sekizawayuuki
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech taggersadakpramodh
 

Mais procurados (20)

BERT
BERTBERT
BERT
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Class9
 Class9 Class9
Class9
 
Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translate
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Why Ruby
Why RubyWhy Ruby
Why Ruby
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation
 
Bert
BertBert
Bert
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech tagger
 

Semelhante a Sequence-to-Sequence Voice Conversion Using Context Posterior Probabilities

Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATIONAPPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATIONIJDKP
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translationbehzad66
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESmlaij
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdfsimonp16
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Kotaro Hara
 
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...Association for Computational Linguistics
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONIMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONcsandit
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti1
 

Semelhante a Sequence-to-Sequence Voice Conversion Using Context Posterior Probabilities (20)

Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATIONAPPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
 
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...
Jérémy Ferrero - 2017 - Using Word Embedding for Cross-Language Plagiarism ...
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONIMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
FYPReport
FYPReportFYPReport
FYPReport
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 

Último

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Último (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

Sequence-to-Sequence Voice Conversion Using Context Posterior Probabilities

  • 1. Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari (The University of Tokyo) Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities INTERSPEECH Tue-O-4-10-1 Stockholm, Sweden Aug. 22, 2017
  • 2. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 1/15 Outline of This Talk Issue:  Voice conversion needs parallel data of source and target speakers. Conventional method  Voice conversion using context posterior probabilities (CPPs). [Sun et al., 2016] 1. Recognition: source speech feats. → source CPPs. 2. Synthesis: copied source CPPs. → target speech feats. Pros. : Non-parallel voice conversion Cons. : Difficulty of converting speaker individuality included in CPPs Proposed:  Sequence-to-sequence (Seq2Seq) conversion from source CPPs to target CPPs  Joint training of recognition and synthesis to increase conversion performance Results:  Seq2Seq learning achieved variable-length voice conversion.  Joint training improved speaker similarity and quality of converted speech.
  • 3. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 2/15 Conventional Voice Conversion Algorithm: Copied Context Posterior Probability [Sun et al., 2016] Training Target speech feats. LSTM LSTM Source speech feats. a i u Target CPP 𝒙 𝑹(⋅) CPP Context label 𝒍 𝑥ෝ𝒑 𝒙 ෝ𝒑 𝑦 𝑮(⋅) 𝒚 𝑮(ෝ𝒑 𝑦) 1. Recognition 2. Synthesis Time Recognition Error (Softmax cross entropy) 𝐿 𝐶(ෝ𝒑 𝒙, 𝒍 𝑥) Synthesis Error (Mean squared error) 𝐿 𝐺(𝑮(ෝ𝒑 𝑦), 𝒚) Separated training
  • 4. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 3/15 Conventional Voice Conversion Algorithm: Copied Context Posterior Probability [Sun et al., 2016] Conversion (conventional) Predicted speech feats. LSTM LSTM Source speech feats. Target CPP 𝒙 𝑹(⋅) Source CPP 𝑮(⋅) ෝ𝒚 Time ෝ𝒑 𝒙 𝑮(ෝ𝒑 𝑦) 1. Recognition 2. Synthesis COPY
  • 5. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 4/15 Time Issues of Conventional Voice Conversion 1. CPPs’ shapes and lengths are significantly different betw. speakers. Shapes are different. Lengths of each phoneme are different. 2. Improving recognition accuracy ≠ improving synthesis accuracy Conventional method separately trains speech recognition/synthesis.
  • 6. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 5/15 Proposed Algorithms 1. Sequence-to-Sequence Conversion from Source CPP to Target CPP 2. Joint Training of Recognition and Synthesis (like auto-encoding)
  • 7. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 6/15 Sequence-to-Sequence Learning [Sutskever et al., 2014]  Sequence-to-Sequence Learning: variable-length conversion 雨 が 降る It rainsInput sequence (Encoder) Output sequence (Decoder) Japanese-to-English translation using Seq2Seq learning Constraints  Phoneme duration is given.  Conversion is done phoneme by phoneme.  Problems of Seq2Seq conversion of CPPs ・Determining duration is difficult. ・Conversion failures propagate if the number of frames to be generated is large. [Weng et al., 2016]
  • 8. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 7/15 Sequence-to-Sequence Conversion of CPPs Target speech feats. LSTM LSTM Source speech feats. Target CPP 𝒙 𝑹(⋅) CPP ෝ𝒑 𝒙 𝑪(ෝ𝒑 𝒙) 𝑮(⋅) ෝ𝒚 𝑮(𝑪(ෝ𝒑 𝑥)) 1. Recognition 2. Synthesis Time  Conversion Seq2Seq conversion 𝑪(⋅) Loss function: 𝑳 𝑮 𝑪(ෝ𝒑 𝒙), ෝ𝒑 𝒚 + 𝑳 𝑪 𝑪 ෝ𝒑 𝒙 , 𝒍 𝒚 Mean squared error Softmax cross entropy betw. predicted CPPs/target labels Minimizes conversion error Alleviates recognition error included in target CPPs.
  • 9. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 8/15 Effect of the Proposed Algorithm  Variable-length voice conversion 0 1 Variable-length conversion of CPPs is achieved! Source CPP Target CPP Frame CPP after Seq2Seq conversion Time
  • 10. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 9/15 Joint Training of Speech Recognition and Synthesis  Training Source speech feats. LSTM LSTM Source speech feats. Source CPP 𝒙 𝑹(⋅) 𝒍 𝑥 ෝ𝒑 𝒙 𝑮(⋅) 𝒙 𝑮(ෝ𝒑 𝑥) 1. Recognition 2. Synthesis Time Joint training Recognition Error 𝐿 𝐶 𝑹 𝒙 , 𝒍 𝑥 + 𝐿 𝐺(𝑮(ෝ𝒑 𝑥), 𝒙) (Conventional term) + Synthesis error using predicted CPP
  • 12. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 11/15 Experimental Setup Dataset ATR Japanese speech database (phonetically balanced 503 sentences) Training/Test 450 sentences / 53 sentences (16 kHz sampling) Linguistic feats. 224-dimensional vectors (phonemes) Speech feats. Mel-cepstrum (1st-through-24th) + Delta Optimization algorithm AdaGrad (learning rate = 0.01) [Duchi et al., 2011.] Recognition/ Synthesis Model Bidirectional LSTM (256 units) Encoder / Decoder Bidirectional LSTM / LSTM (256 units each) Number of Speakers 8 people including source and target speaker
  • 13. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 12/15 Objective and Subjective Evaluations of Seq2Seq Learning Objective Eval. Subjective Eval. Better! Better! Worse Error bars denote 95 % confidence intervals. Source Target Voice samples are available online.
  • 14. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 13/15 Objective Evaluation of Joint Training Better! Joint Training got better score on mel-cepstral distortion! Auto-encoding case Calculates reconstruction error after recognition and synthesis.
  • 15. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 14/15 Subjective Evaluation of Joint Training Better! Subjective Eval. Better! Joint Training made both speaker similarity and speech quality better!
  • 16. INTERSPEECH 2017 @Stockholm Aug. 22, 2017 15/15 Conclusion Issue:  Difficulty of converting speaker individuality included in CPPs.  Improving recognition accuracy ≠ improving synthesis accuracy. Proposed:  Sequence-to-sequence (Seq2Seq) conversion from source CPPs to target CPPs.  Joint training of recognition and synthesis. Results:  Seq2Seq learning achieved variable-length voice conversion.  Joint training improved speaker similarity and quality of converted speech.