SlideShare uma empresa Scribd logo
1 de 10
Maithili Text-to-Speech System
Amit Kumar Jha
Pankaj Dwivedi
Piyush Pratap Singh
July 2109
IEEE CONECCT-2019
JULY 26-27, 2019
Abstract
• The paper discusses development of Maithili TTS system.
• Unit Selection Concatenative Method
• Basic unit - Syllable
• Speech corpus – 8 hours (5 hours borrowed from LDCIL, CIIL Mysore, and 3 hours is collected
from native speakers in studio environment).
• 1055 most frequently occurring words have been recorded and stored.
• Interface - C#.Net
• 930 syllable (C*V) - 300 syllables (30*10) and 10 independent vowels.
• Evaluation is performed by 10 native speakers using MOS.
• The quality of synthesized speech is approximately 84%.
07/29/19 2
Roadmap
• Introduction
• Maithili Phonology and Database Creation
• Proposed Algorithm
• Results and Discussion
• Conclusion
07/29/19 3
Introduction
• A Text-to-Speech System (TTS) converts a raw text into human speech sounds.
• Aid to communication for visually impaired people, its role in telecommunication, industrial
and educational applications and many more.
• GOI initiated development of TTS systems for Indian languages through TTS consortium
project under MeitY(Ministry of Electronics and Information Technology).
• 13 Indian languages, namely Hindi, Gujarati, Telugu, Marathi, Tamil, Odia, Malayalam,
Bengali, Assamese, Kannada, Bodo, Manipuri, and Rajasthani.
• Methodologies: articulatory synthesis, formant synthesis, unit selection synthesis (USS), and
HMM based speech synthesis (HTS), etc.
07/29/19 4
Maithili Language
• Maithili (EGIDS 0-4) - Bihar, India and Eastern Nepal (approximate 30 million).
• Script – Devanagari, Kaithi, Mithilakshar (also known as Tirhuta), and Newari.
• 16 phonologically distinctive vowel segments - 8 oral vowels [i e  æ a ə ɔ o u] and 8
corresponding nasal vowels [ ].
• 2 distinctive oral diphthongs /əi/ and /əu/.
• 33 consonantal segments [
m, n, ŋ, ʃ, ɳ, ʂ,w (v), j, r, ɽ, l,] are used in written form, only 30 segments are realized in
speech.
The consonants [ , , ] are replaced by [s, n and s], respectively in speech.ʃ ɳ ʂ
• <ʃaːm> → [saːm] ‘evening’, <baːɳ> → [baːn] ‘arrow’, and <kəʂʈ> → [kəsʈ] ‘pain’. The
sound [ ] is also realized as [r] is intervocalic and syllabic boundary positions.ɽ
<ɡʰoɽa > → [ː ɡʰoraː] ‘horse’ and < kəɽ.ək > →[ kər.ək] ‘strict’.
• Native Maithili exhibits four types of syllabic structure V, CV, VC and CVC.
07/29/19 5
Database Creation
• Phonetically balanced text data (corpus) is collected in studio environment at sampling
frequency of 16 KHz/16 bits.
• Domains: children stories, literature, science, tourism, politics, history, daily affairs, drama,
poetry, etc. were adequately covered.
• Source: published books, newspapers, local periodicals & magazines, and web pages & blogs
and dictionaries Kalyani shabdkosh.
• 120 oral and folk narratives (stories and legends were audio-translitrated and then recorded in
studio environment.
• PRAAT software - Sentence, Word, and Syllable levels.
• For example, an syllable ‘’ ’ [] <then>, ‘’ [lətam] <guava>, and ‘’
• Speech database consists of 930 syllable (C*V). [(300*3) + (10*3)] = 930
07/29/19 6
PROPOSED ALGORITHM
• Concatenate Unit Selection Synthesis (USS) has been unsed as it uses small amount of Digital
Signal Processing (DSP) to speech recorded data.
• DSP makes speech less natural; to smoothen the waveform, some systems anyway apply small
amount of signal processing at the point of concatenation.
• Input text is tokenized into words based on white space and special symbol such as, purn viram
(full stop), semicolon, comma, colon, question mark, exclamation mark, etc.
• Identify the NSW tokens such as abbreviation, acronyms, number, fractions, ratios, symbols,
dates, time, etc.
• Classify the tokens as abbreviation, acronyms, numbers, symbol, date, URL, etc.
• Convert Non-standard words to standards words by corresponding expansion rules and
developed lexicon.
07/29/19 7
Flowchart
• Input Maithili text using UTF-16.
• text is normalized using three algorithmic
modules written in C# and SQL
• Segmentation of inputted text into sentence
and word level.
• A word level search is done in database and if
it is found then corresponding speech file is
added into playlist. Else, the word is broken
into corresponding syllables and
corresponding syllables files are searched and
added in playlist.
• Found speech units are concatenated in
playlist using digital signal processing
• Play the sound of playlist
07/29/19 8
Results and Discussion
MOS Score
07/29/19 9
MOS Chart for Quality analysis
Conclusion
• Mean Opinion Score (MOS) from 10 users
was calculated on test data((5 Male and 5
Female).
• 10 Sample sentences were covering different
domain were given to the evaluators.
• The quality of the TTS system for Maithili
language is 4.2, i.e. 84 percent.
• Prosodic differences in some speech samples
were found due to intra-dialectal and inter-
dialectal differences.
• The present TTS system can be made more
robust with implementation of prosodic
features.
07/29/19 10

Mais conteúdo relacionado

Semelhante a Maithili Text-to-Speech

Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
Guy De Pauw
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

Semelhante a Maithili Text-to-Speech (20)

551 466-472
551 466-472551 466-472
551 466-472
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
G1803013542
G1803013542G1803013542
G1803013542
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Tech ppt. 1
Tech ppt. 1Tech ppt. 1
Tech ppt. 1
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
SMT3
SMT3SMT3
SMT3
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Exploring the effects of digital storytelling as second language narrative wr...
Exploring the effects of digital storytelling as second language narrative wr...Exploring the effects of digital storytelling as second language narrative wr...
Exploring the effects of digital storytelling as second language narrative wr...
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 

Mais de Dr. Amit Kumar Jha

Mais de Dr. Amit Kumar Jha (20)

E learning app development
E learning app developmentE learning app development
E learning app development
 
राजभाषा हिंदी के विकास में कंप्यूटर एवं प्रौद्योगिकी का योगदान
राजभाषा हिंदी के विकास में कंप्यूटर एवं प्रौद्योगिकी का योगदानराजभाषा हिंदी के विकास में कंप्यूटर एवं प्रौद्योगिकी का योगदान
राजभाषा हिंदी के विकास में कंप्यूटर एवं प्रौद्योगिकी का योगदान
 
भारतीय भाषाओं के लिए डिजिटल भाषिक मानचित्र
भारतीय भाषाओं के लिए डिजिटल भाषिक मानचित्रभारतीय भाषाओं के लिए डिजिटल भाषिक मानचित्र
भारतीय भाषाओं के लिए डिजिटल भाषिक मानचित्र
 
Hindi Language and Information Technology
Hindi Language and Information TechnologyHindi Language and Information Technology
Hindi Language and Information Technology
 
Information Management System Rajbhasha
Information Management System RajbhashaInformation Management System Rajbhasha
Information Management System Rajbhasha
 
Morphology
MorphologyMorphology
Morphology
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Microsoft office & Internet
Microsoft office & InternetMicrosoft office & Internet
Microsoft office & Internet
 
कंप्यूटर पर हिंदी में कार्य
कंप्यूटर पर हिंदी में कार्यकंप्यूटर पर हिंदी में कार्य
कंप्यूटर पर हिंदी में कार्य
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languages
 
Clickable Language Map of India
Clickable Language Map of IndiaClickable Language Map of India
Clickable Language Map of India
 
Machine translation And Anusaaraka
Machine translation And AnusaarakaMachine translation And Anusaaraka
Machine translation And Anusaaraka
 
Networking and Topology
Networking and TopologyNetworking and Topology
Networking and Topology
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
Scientific Research methodology
Scientific Research methodologyScientific Research methodology
Scientific Research methodology
 
LingPy : A Python Library for Historical Linguistics
LingPy : A Python Library for Historical LinguisticsLingPy : A Python Library for Historical Linguistics
LingPy : A Python Library for Historical Linguistics
 
लिनक्स (Linux)
लिनक्स (Linux) लिनक्स (Linux)
लिनक्स (Linux)
 
कंप्यूटर की पीढ़ियाँ
कंप्यूटर की पीढ़ियाँ कंप्यूटर की पीढ़ियाँ
कंप्यूटर की पीढ़ियाँ
 
Online Examination Portal
Online Examination PortalOnline Examination Portal
Online Examination Portal
 
Information engineering
Information engineeringInformation engineering
Information engineering
 

Último

Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
IJECEIAES
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
benjamincojr
 

Último (20)

Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 

Maithili Text-to-Speech

  • 1. Maithili Text-to-Speech System Amit Kumar Jha Pankaj Dwivedi Piyush Pratap Singh July 2109 IEEE CONECCT-2019 JULY 26-27, 2019
  • 2. Abstract • The paper discusses development of Maithili TTS system. • Unit Selection Concatenative Method • Basic unit - Syllable • Speech corpus – 8 hours (5 hours borrowed from LDCIL, CIIL Mysore, and 3 hours is collected from native speakers in studio environment). • 1055 most frequently occurring words have been recorded and stored. • Interface - C#.Net • 930 syllable (C*V) - 300 syllables (30*10) and 10 independent vowels. • Evaluation is performed by 10 native speakers using MOS. • The quality of synthesized speech is approximately 84%. 07/29/19 2
  • 3. Roadmap • Introduction • Maithili Phonology and Database Creation • Proposed Algorithm • Results and Discussion • Conclusion 07/29/19 3
  • 4. Introduction • A Text-to-Speech System (TTS) converts a raw text into human speech sounds. • Aid to communication for visually impaired people, its role in telecommunication, industrial and educational applications and many more. • GOI initiated development of TTS systems for Indian languages through TTS consortium project under MeitY(Ministry of Electronics and Information Technology). • 13 Indian languages, namely Hindi, Gujarati, Telugu, Marathi, Tamil, Odia, Malayalam, Bengali, Assamese, Kannada, Bodo, Manipuri, and Rajasthani. • Methodologies: articulatory synthesis, formant synthesis, unit selection synthesis (USS), and HMM based speech synthesis (HTS), etc. 07/29/19 4
  • 5. Maithili Language • Maithili (EGIDS 0-4) - Bihar, India and Eastern Nepal (approximate 30 million). • Script – Devanagari, Kaithi, Mithilakshar (also known as Tirhuta), and Newari. • 16 phonologically distinctive vowel segments - 8 oral vowels [i e  æ a ə ɔ o u] and 8 corresponding nasal vowels [ ]. • 2 distinctive oral diphthongs /əi/ and /əu/. • 33 consonantal segments [ m, n, ŋ, ʃ, ɳ, ʂ,w (v), j, r, ɽ, l,] are used in written form, only 30 segments are realized in speech. The consonants [ , , ] are replaced by [s, n and s], respectively in speech.ʃ ɳ ʂ • <ʃaːm> → [saːm] ‘evening’, <baːɳ> → [baːn] ‘arrow’, and <kəʂʈ> → [kəsʈ] ‘pain’. The sound [ ] is also realized as [r] is intervocalic and syllabic boundary positions.ɽ <ɡʰoɽa > → [ː ɡʰoraː] ‘horse’ and < kəɽ.ək > →[ kər.ək] ‘strict’. • Native Maithili exhibits four types of syllabic structure V, CV, VC and CVC. 07/29/19 5
  • 6. Database Creation • Phonetically balanced text data (corpus) is collected in studio environment at sampling frequency of 16 KHz/16 bits. • Domains: children stories, literature, science, tourism, politics, history, daily affairs, drama, poetry, etc. were adequately covered. • Source: published books, newspapers, local periodicals & magazines, and web pages & blogs and dictionaries Kalyani shabdkosh. • 120 oral and folk narratives (stories and legends were audio-translitrated and then recorded in studio environment. • PRAAT software - Sentence, Word, and Syllable levels. • For example, an syllable ‘’ ’ [] <then>, ‘’ [lətam] <guava>, and ‘’ • Speech database consists of 930 syllable (C*V). [(300*3) + (10*3)] = 930 07/29/19 6
  • 7. PROPOSED ALGORITHM • Concatenate Unit Selection Synthesis (USS) has been unsed as it uses small amount of Digital Signal Processing (DSP) to speech recorded data. • DSP makes speech less natural; to smoothen the waveform, some systems anyway apply small amount of signal processing at the point of concatenation. • Input text is tokenized into words based on white space and special symbol such as, purn viram (full stop), semicolon, comma, colon, question mark, exclamation mark, etc. • Identify the NSW tokens such as abbreviation, acronyms, number, fractions, ratios, symbols, dates, time, etc. • Classify the tokens as abbreviation, acronyms, numbers, symbol, date, URL, etc. • Convert Non-standard words to standards words by corresponding expansion rules and developed lexicon. 07/29/19 7
  • 8. Flowchart • Input Maithili text using UTF-16. • text is normalized using three algorithmic modules written in C# and SQL • Segmentation of inputted text into sentence and word level. • A word level search is done in database and if it is found then corresponding speech file is added into playlist. Else, the word is broken into corresponding syllables and corresponding syllables files are searched and added in playlist. • Found speech units are concatenated in playlist using digital signal processing • Play the sound of playlist 07/29/19 8
  • 9. Results and Discussion MOS Score 07/29/19 9 MOS Chart for Quality analysis
  • 10. Conclusion • Mean Opinion Score (MOS) from 10 users was calculated on test data((5 Male and 5 Female). • 10 Sample sentences were covering different domain were given to the evaluators. • The quality of the TTS system for Maithili language is 4.2, i.e. 84 percent. • Prosodic differences in some speech samples were found due to intra-dialectal and inter- dialectal differences. • The present TTS system can be made more robust with implementation of prosodic features. 07/29/19 10

Notas do Editor

  1. The slide guide is available in the following file: slidesV20.1.ppt:Fix reference to ITC xxxSite slidesV20.0.ppt:PowerPoint, version 2003 format. Note: We have saved this presentation in the older 2003 format, because PowerPoint 2003, 2007 through 2016 can read it. For this year’s test conference we will use PowerPoint (Office) 2016 in our projection computers.