A framework for bangla text to speech synthesis

•

2 gostaram•1,197 visualizações

Sanjoy Dutta

My conference presentation slide for my paper in 16th ICCIT conference, 2013.

Tecnologia

A Framework for Bangla Text to Speech
Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors

Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
2

Problem Statement
•Develop a framework for Bangla Text to
Speech Synthesis.
3

Factors for Speech Synthesis in Bangla
• Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between
two phonemes are modelled, usually from the middle of the first phoneme to
the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the
same function by speakers of the language or dialect in question. Like in
English for K/C phoneme: Skill, School.
• Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5

Proposed Framework Structure and
Rules
• Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates,
acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7

Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel ’ ‘ is the exception) vowel
13

Audio File Selection and Normalization
Finally
224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14

Experimental Analysis and Results
Strategy of Analysis:
Sample Input Test: Various News Articles from News Portals
Listeners Selection: Anonymous Personals Chosen Randomly
Accuracy Analysis:
Accuracy =
𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒
16

Experiment Result
Listening Factors:
• Duration Synchronization and
Merging
• Numerical Value like years
Constrains in Sample 1:
‌ , , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17

Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18

CONCLUSION
We believe the proposed framework can be useful for Bangla TTS
development to detect the Bangla words with minimum audio file
requirement.
19

Mais conteúdo relacionado

Mais procurados

Frontiers of Natural Language ProcessingSebastian Ruder

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig

Question Answering - Application and ChallengesJens Lehmann

Information HighlightingTim Ostler

Filled pauses and L2 proficiency: Finnish Australians speaking EnglishWybo Wiersma

Lecture 1: Semantic Analysis in Language TechnologyMarina Santini

Artificial Intelligence Notes Unit 4DigiGurukul

Corpus study designbikashtaly

Query Translation for Cross-lingual Search in the Academic Search Engine PubP...Juliane Stiller

Arabic question answering ‫‬Arabic_NLP_ImamU2013

Mais procurados (10)

Frontiers of Natural Language Processing

ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...

Question Answering - Application and Challenges

Information Highlighting

Filled pauses and L2 proficiency: Finnish Australians speaking English

Lecture 1: Semantic Analysis in Language Technology

Artificial Intelligence Notes Unit 4

Corpus study design

Query Translation for Cross-lingual Search in the Academic Search Engine PubP...

Arabic question answering ‫‬

Destaque

SpeechSyifa Fuadah

Speech by Sheikh Hasina, MP, Honourable Prime Minister Government of the Peop...Bangladesh Food Security Investment Forum

Drug Development ProcessClifford Mintz

Drug discovery process style 5 powerpoint presentation templatesSlideTeam.net

Introduction To Drug Discoveryubio Biotechnology Systems Pvt Ltd

Drug discovery and developmentDr. Prashant Shukla

CorticosteroidsDr Shah Murad

Drug discovery and developmentKarun Kumar

Drug Discovery & Development OverviewMikeSumner

8 Parts of Speech PowerPointwinsleyn

Dynamic thresholding on speech segmentationeSAT Publishing House

Drug Design:Discovery, Development and DeliveryProf. Dr. Basavaraj Nanjwade

Drug development and clinical trial phasesSunil Boreddy Rx

Drug Development Life CycleRajendra Sadare

Drug discovery and developmentrahul_pharma

Destaque (15)

Speech

Speech by Sheikh Hasina, MP, Honourable Prime Minister Government of the Peop...

Drug Development Process

Drug discovery process style 5 powerpoint presentation templates

Introduction To Drug Discovery

Drug discovery and development

Corticosteroids

Drug discovery and development

Drug Discovery & Development Overview

8 Parts of Speech PowerPoint

Dynamic thresholding on speech segmentation

Drug Design:Discovery, Development and Delivery

Drug development and clinical trial phases

Drug Development Life Cycle

Drug discovery and development

Semelhante a A framework for bangla text to speech synthesis

Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce

Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006

G1803013542IOSR Journals

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce

FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONsipij

PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma

ENHANCING NON-NATIVE ACCENT RECOGNITION THROUGH A COMBINATION OF SPEAKER EMBE...sipij

Natural Language ProcessingVarunjeet Singh Rekhi

Hi I am Ram.pptxShubhamJain981677

Isolated English Word Recognition System: Appropriate for Bengali-accented En...International Journal of Science and Research (IJSR)

Semantic similarity of distractors in multiple- choice tests: extrinsic evalu...Andrea Varga

FYPReportDavid Ferris

Weapons Manufacturer Raytheon Open Sources Speech Translation Dataset.pdfSlator- Language Industry Intelligence

Paper id 25201466IJRAT

A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc

SQM_PKM_Final.pdfssuserb5206b

Shallow parser for hindi language with an input from a transliteratorShashank Shisodia

LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han

Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han

Semelhante a A framework for bangla text to speech synthesis (20)

Implementation of Marathi Language Speech Databases for Large Dictionary

Improvement in Quality of Speech associated with Braille codes - A Review

G1803013542

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...

FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION

PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...

ENHANCING NON-NATIVE ACCENT RECOGNITION THROUGH A COMBINATION OF SPEAKER EMBE...

Natural Language Processing

Hi I am Ram.pptx

Isolated English Word Recognition System: Appropriate for Bengali-accented En...

Semantic similarity of distractors in multiple- choice tests: extrinsic evalu...

FYPReport

Weapons Manufacturer Raytheon Open Sources Speech Translation Dataset.pdf

Paper id 25201466

A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES

SQM_PKM_Final.pdf

Shallow parser for hindi language with an input from a transliterator

LEPOR: an augmented machine translation evaluation metric - Thesis PPT

Lepor: augmented automatic MT evaluation metric

Último

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Partners Life - Insurer Innovation Award 2024The Digital Insurer

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

A framework for bangla text to speech synthesis

1. A Framework for Bangla Text to Speech Synthesis Authors K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi Presented By Sanjoy Dutta Department of Computer Science & Engineering Khulna University of Engineering and Technology, Khulna, Bangladesh. Authors

2. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 2

3. Problem Statement •Develop a framework for Bangla Text to Speech Synthesis. 3

4. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 4

5. Factors for Speech Synthesis in Bangla • Sequential flow of diphones A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme. A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School. • Position vs. Pronunciation Three kinds of position occurs of consonant and vowels: Constant Vowel(CV) Vowel Constant(VC) Vowel Constant Vowel(VCV) 5

6. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 6

7. Proposed Framework Structure and Rules • Text Normalization: Transforming text into a single standard form. Used when converting text to speech, numbers, dates, acronyms, and abbreviations. Text Normalization for Position vs. Pronunciation. 7

8. Normalization rules for ‘ ’ 8

9. Normalization rules for ‘ - - - ’ 9

10. Syllable Parser Development 10

11. Syllable Parser In Action 11

12. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 12

13. Audio File Selection and Normalization Total 39 consonants 11 vowels in Bangla After Reduction 28 independent consonants 8 (the vowel ’ ‘ is the exception) vowel 13

14. Audio File Selection and Normalization Finally 224 (28*8) audio files for the syllables. 28 consonant against 5 vowels to generate 140 (28*5) diphones. In summary, we need (9 vowels, 28 consonants, 224 syllables and 140 diphones) 401 audio files to be created. 14

15. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 15

16. Experimental Analysis and Results Strategy of Analysis: Sample Input Test: Various News Articles from News Portals Listeners Selection: Anonymous Personals Chosen Randomly Accuracy Analysis: Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100 𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒 16

17. Experiment Result Listening Factors: • Duration Synchronization and Merging • Numerical Value like years Constrains in Sample 1: ‌ , , , , , , Constrains in Sample 2: , , , , , , 17

18. Limitations and Future Works Detect Noun and Adjective words namely ( ) Noun and ( ) Adjective both words should follow the rule 3(a) . But they don't follow the rule 3(a) and their pronunciation is different. 18

19. CONCLUSION We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement. 19

20. Thank You !!! 20

A framework for bangla text to speech synthesis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (10)

Destaque

Destaque (15)

Semelhante a A framework for bangla text to speech synthesis

Semelhante a A framework for bangla text to speech synthesis (20)

Último

Último (20)

A framework for bangla text to speech synthesis