The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Â
A framework for bangla text to speech synthesis
1. A Framework for Bangla Text to Speech
Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors
2. Contents
âą Problem Statement
âą Factors for Speech Synthesis in Bangla
âą Proposed Framework
âą Rules and Structure Development
âą Syllable Parser Development
âą Audio File Selection and Normalization
âą Experimental Analysis & Results
âą Conclusion
2
4. Contents
âą Problem Statement
âą Factors for Speech Synthesis in Bangla
âą Proposed Framework
âą Rules and Structure Development
âą Syllable Parser Development
âą Audio File Selection and Normalization
âą Experimental Analysis & Results
âą Conclusion
4
5. Factors for Speech Synthesis in Bangla
âą Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between
two phonemes are modelled, usually from the middle of the first phoneme to
the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the
same function by speakers of the language or dialect in question. Like in
English for K/C phoneme: Skill, School.
âą Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5
6. Contents
âą Problem Statement
âą Factors for Speech Synthesis in Bangla
âą Proposed Framework
âą Rules and Structure Development
âą Syllable Parser Development
âą Audio File Selection and Normalization
âą Experimental Analysis & Results
âą Conclusion
6
7. Proposed Framework Structure and
Rules
âą Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates,
acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7
12. Contents
âą Problem Statement
âą Factors for Speech Synthesis in Bangla
âą Proposed Framework
âą Rules and Structure Development
âą Syllable Parser Development
âą Audio File Selection and Normalization
âą Experimental Analysis & Results
âą Conclusion
12
13. Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel â â is the exception) vowel
13
14. Audio File Selection and Normalization
Finally
224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14
15. Contents
âą Problem Statement
âą Factors for Speech Synthesis in Bangla
âą Proposed Framework
âą Rules and Structure Development
âą Syllable Parser Development
âą Audio File Selection and Normalization
âą Experimental Analysis & Results
âą Conclusion
15
17. Experiment Result
Listening Factors:
âą Duration Synchronization and
Merging
âą Numerical Value like years
Constrains in Sample 1:
â , , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17
18. Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18
19. CONCLUSION
We believe the proposed framework can be useful for Bangla TTS
development to detect the Bangla words with minimum audio file
requirement.
19