SlideShare uma empresa Scribd logo
1 de 30
Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu
Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2
What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3
TTS Applications
Applications:
 Services for the visually impaired community
 Services for the Illiterate people with difficulties in reading
 Enable use of Computers and IT services
 Reading email aloud
 Using Word processor
 Using Internet
Commercial TTS Systems:
 Festival
 Bell Labs TTS
4
How TTS Works
5
Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6
Different TTS Systems (cont’d)
Diphone-Based TTS System:
 Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7
Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8
Word to Diphone Converter
(Phonetization)
 Purpose
 Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
 mark the text into prosodic units such as phrases,
clauses and sentences
 Resource
– Dictionary of words and their diphones
9
Prosody
Diphone
Retrieval
ConcatenationAcoustic
Manipulation
Diphone
Database
Prosody
Param.
10
Properties of Speech
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
11
Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch,
duration and amplitude at the concatenation
point is very important.
12
Altering Pitch
Hanning
window
Original diphone Extracted
pitch period
Hanned
pitch period
X
=
13
PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14
Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15
Altering Amplitude
 Multiplying the signal by a constant
 If constant > 1, amplitude increase
 If constant < 1, amplitude decrease
16
Concatenation
Diphones  Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words  Sentence
• Straight joining at the end points due to
presence of pauses
17
Putting All Together
TTS System
Text
Pre-processing Prosody Concatenation
words
18
Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19
Progress of Bangla TTS
• KATHA
 Developed in BRAC university
 Unit based system using Festival framework
 4355 Diphones
 Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
 syllable based synthesis system
 Developed in Kolkata
• SUBACHAN
 Developed by SUST people
 Diphone based synthesis system
 527 Diphones
 Takes 45ms to generate a 10 sec utterance
20
Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21
Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22
Solution: Homograph Disambiguation
 Collect allpossible homograph words
 Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23
Problems: Improper Concatenation
24
Not concatenated
properly
Signal from the the
utterance of রাশেদ
Solution: Improper Concatenation
• PSOLA
• Reducing number of concatenation point
– Ex 1. Sentence-> কামাে ভাে বছলে।
Diphones-> কা + আমা + আে ভা+আলো বছ+এলে
Instead of ক + কআ +আম + মআ +আে + ে …
– Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী
• Vowel sound is periodic, thus suitable for
appropriate concatenation
• Use 1000 most frequently spoken word
25
Duration Modeling
26
Duration Modeling
27
Thank you all!
Suggestions??
28
Sound Synthesized by Katha
• Katha
29
Sound Synthesized by Subachan
• Subachan
30

Mais conteúdo relacionado

Destaque

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Sonali Jannat
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentationshahinmehr
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksSJones87
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaArabicOntology
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaMuhammad Haroon
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
BIODERMA
BIODERMABIODERMA
BIODERMAIeva_S
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseemDr. Aseem Sharma
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognitionMark Williams
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NETMandeep Cheema
 

Destaque (17)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentation
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinks
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
 
Mp3englishreview
Mp3englishreviewMp3englishreview
Mp3englishreview
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and Kannada
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
BIODERMA
BIODERMABIODERMA
BIODERMA
 
Bangla OCR
Bangla OCRBangla OCR
Bangla OCR
 
парки легені міст і сіл
парки   легені міст і сілпарки   легені міст і сіл
парки легені міст і сіл
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseem
 
Physics (NSC013)
Physics (NSC013)Physics (NSC013)
Physics (NSC013)
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
General principles of drug action
General principles of drug actionGeneral principles of drug action
General principles of drug action
 

Semelhante a Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能Amazon Web Services
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Amazon Web Services
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)Amazon Web Services
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!台灣資料科學年會
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...ravi sharma
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalcaptainmactavish1996
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 

Semelhante a Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman (20)

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
G1803013542
G1803013542G1803013542
G1803013542
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Translation
TranslationTranslation
Translation
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Último

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

  • 1. Progress on Bangla Text-To-Speech System Presented By: Dr. M. Shahidur Rahman Professor, Dept. of Computer Science & Engg. Shahjalal University of Science & Technology rahmanms@sust.edu
  • 2. Outline • Introduction to TTS • How TTS works • Present Bangla TTS systems • Problems of the present Bangla TTS • Directions to improve the performance of Bangla TTS • Discussion… 2
  • 3. What is a TTS? • The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together isolated words – Instead, TTS employs linguistic analysis to infer correct pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP) 3
  • 4. TTS Applications Applications:  Services for the visually impaired community  Services for the Illiterate people with difficulties in reading  Enable use of Computers and IT services  Reading email aloud  Using Word processor  Using Internet Commercial TTS Systems:  Festival  Bell Labs TTS 4
  • 6. Different TTS Systems Phoneme-Based TTS System • Phonemes are: – The minimal distinctive phonetic units – Relatively small in number (39 phonemes in English) • Disadvantage – Phonemes ignore transitional sound !!! 6
  • 7. Different TTS Systems (cont’d) Diphone-Based TTS System:  Diphones are: – Made up of 2 phonemes – Incorporate transitional sound – Produce better sounding speech – Ex. কক = ক + কঅ + অক + ক Disadvantage: • Over 1500 diphones in English language !!! 7
  • 8. Text Pre-Processing • Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words 8
  • 9. Word to Diphone Converter (Phonetization)  Purpose  Translate words to their diphone representations (Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})  mark the text into prosodic units such as phrases, clauses and sentences  Resource – Dictionary of words and their diphones 9
  • 12. Altering Pitch/Duration/Amplitude • For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important. 12
  • 13. Altering Pitch Hanning window Original diphone Extracted pitch period Hanned pitch period X = 13
  • 14. PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50% 14
  • 15. Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration 15
  • 16. Altering Amplitude  Multiplying the signal by a constant  If constant > 1, amplitude increase  If constant < 1, amplitude decrease 16
  • 17. Concatenation Diphones  Word • Using PSOLA at the joining ends • Ensures smooth transition Words  Sentence • Straight joining at the end points due to presence of pauses 17
  • 18. Putting All Together TTS System Text Pre-processing Prosody Concatenation words 18
  • 19. Types of Concatenative speech synthesis • Concatenative synthesis with a fixed inventory – contain one sample for each unit, and perform prosodic modification to match the required prosody • Unit-selection-based synthesis – store several instances of each unit, thus improving the chances of finding a well-matched unit 19
  • 20. Progress of Bangla TTS • KATHA  Developed in BRAC university  Unit based system using Festival framework  4355 Diphones  Takes 2 sec to generate a 10 sec utterance • BANGLA VAANI  syllable based synthesis system  Developed in Kolkata • SUBACHAN  Developed by SUST people  Diphone based synthesis system  527 Diphones  Takes 45ms to generate a 10 sec utterance 20
  • 21. Speech Signal From Kotha and Subachan • (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ- তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of kotha) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব • (Voice of Subachan) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব 21
  • 22. Problems: Homograph Ambiguity • Homographs are words that share the same spelling but differ in meaning and pronunciation 22
  • 23. Solution: Homograph Disambiguation  Collect allpossible homograph words  Determine POS tag of the homograph words Ex. বছলেরামালেিে (bol) বেেলছ। িু তম যালি তক িা িে (bolo)। • Bayes Theorem can also be applied to determine the likelihood of a word. 23
  • 24. Problems: Improper Concatenation 24 Not concatenated properly Signal from the the utterance of রাশেদ
  • 25. Solution: Improper Concatenation • PSOLA • Reducing number of concatenation point – Ex 1. Sentence-> কামাে ভাে বছলে। Diphones-> কা + আমা + আে ভা+আলো বছ+এলে Instead of ক + কআ +আম + মআ +আে + ে … – Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী • Vowel sound is periodic, thus suitable for appropriate concatenation • Use 1000 most frequently spoken word 25
  • 29. Sound Synthesized by Katha • Katha 29
  • 30. Sound Synthesized by Subachan • Subachan 30