SlideShare uma empresa Scribd logo
1 de 92
Baixar para ler offline
BENGALI
SPEECH
RECOGNITION
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LEADING UNIVERSITY, SYLHET
1st January
2013
Bengali Speech Recognition
2
BENGALI SPEECH RECOGNITION
1st JANUARY, 2013
This Project report is submitted to the Department of Computer Science
and Engineering, Leading University, for the partial fulfillment for the
requirements of the degree of Bachelor of Science in Computer Science
and Engineering.
Supervised By
Mrs. Arpita Chakraborty
Assistant Professor
Department of Computer Science and Engineering
Leading University, Sylhet
&
Mrinal Kanti Dhar
Lecturer
Department of Electrical & Electronic Engineering
Leading University, Sylhet
Conducted By
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LEADING UNIVERSITY, SYLHET, BANGLADESH
Shimul Dey
B.Sc (Hon’s) Final Semester
Examination-2013
ID: 0901020032
Session: 2009-2013
Sanjoy Ranjan Das
B.Sc (Hon’s) Final Semester
Examination-2013
ID: 0901020016
Session: 2009-2013
Md. Badrul Alom Chowdhury
B.Sc (Hon’s) Final Semester
Examination-2013
ID: 0901020004
Session: 2009-2013
Bengali Speech Recognition
3
To
The Head
Department of Computer Science and Engineering
Leading University, Sylhet, Bangladesh.
Sub: Proposal for Project.
Respected Sir,
We would like to inform you that, we are the student of your department would like to carryout a
project on “BENGALI SPEECH RECOGNITION”.
We would be grateful to you if you kindly allow us to proceed to complete the project on the
above mention topics under condition of partial fulfillment of the requirements for the degree of
Bachelor of Science in Computer Science and Engineering.
Thanking you.
Yours Sincerely
Name ID
Md. Badrul Alom Chowdhury 0901020004
Sanjoy Ranjan Das 0901020016
Shimul Dey 0901020032
Bengali Speech Recognition
4
DECLARATION
We hereby declare that the project work entitled “Bengali Speech Recognition” submitted
to the Leading University, is a record of an original work done by us under the guidance of
Arpita Chakraborty, Assistant professor in Department of Computer Science and Engineering,
Leading University and this project work is submitted in the fulfillment of Bachelor in Computer
Science & Engineering. The result of this project has not been submitted to any other University
or Institute for the award of any degree or diploma. Materials of work found by other researcher
are mentioned by reference.
Signature of Spervisor & Co-supervisor
Name of Supervisor Signature
Mrs. Arpita Chakraborty
Assistant Professor
Name of Co-supervisor Signature
Mrinal Kanti Dhar
Lecturer
Signature of Authors
Name of Authors Signature
Md. Badrul Alom Chowdhury
Sanjoy Ranjan Das
Shimul Dey
Bengali Speech Recognition
5
ACKNOWLEDGEMENT
We would like to thank our Honorable Supervisor Arpita Chakraborty & Co-supervisor
Mrinal Kanti Dhar for their guidance throughout the process. They exposed us to the real
professional research world with their precious experience. We really cherish for the time
working with them on such an interesting topic. Also we would like to thank our university
students to let us record their voice for experiments and our Computer Science & Engineering
Department for giving us authority and facility to complete the project. Last but not at least,
thanks to the Almighty for helping us in every steps of this project work.
.
Bengali Speech Recognition
6
Table of Contents
Declaration...................................................................................................................................4
Acknowledgments........................................................................................................................5
List of figures...............................................................................................................................8
List of Chart.................................................................................................................................9
List of Table...............................................................................................................................10
List of Abbreviation & Symbols................................................................................................10
Abstract......................................................................................................................................11
Literature Survey .......................................................................................................................12
Chapter 1: Introduction ............................................................................(13-21)
1.1 Introduction ..........................................................................................................14
1.2 History of Speech Recognition ..................................................................... 14-15
1.3 Types of Speech Recognition ...........................................................................15
1.3.1 Isolated Words ......................................................................................15
1.3.2 Connected Words ...............................................................................16
1.3.3 Continuous Words ................................................................................16
1.3.4 Spontaneous Words .............................................................................16
1.3.5 Speaker Dependent ...............................................................................16
1.3.6 Speaker independent .............................................................................16
1.3.7 Overview of Speech Recognition System .............................................17
1.4 Terms and Concepts..........................................................................................17
1.4.1 Utterance ................................................................................................17
1.4.2 Pronunciation ................................................................................... 17-18
1.4.3 Grammars ...............................................................................................18
Bengali Speech Recognition
7
1.4.4 Vocabularies ..........................................................................................18
1.4.5 Training ..................................................................................................18
1.4.6 Accuracy ................................................................................................18
1.4.7 Language Dictionary ...........................................................................18
1.4.8 Filler Dictionary ..................................................................................19
1.4.9 Phone ...................................................................................................19
1.4.10 HMM ........................................................................................... 19-20
1.4.11 Language Model ...............................................................................20
1.5 Overview of the Full system......................................................................................21
Chapter 2: METHODOLOGY ................................................................(22-32)
2.1 Data Preparation................................................................................................23
2.1.1 Corpus....................................................................................................23
2.1.2 Audio Files....................................................................................... 23-24
2.1.3 Dictionary Files................................................................................ 24-25
2.1.4 Phone File ......................................................................................... 25-26
2.1.5 Language Model File .lm Format ...........................................................26
2.1.6 Language Model File .DMP Format................................................. 26-27
2.1.7 Transcription File....................................................................................27
2.1.8 Fileids File ........................................................................................ 27-28
2.1.9 Filler File.................................................................................................28
2.2 Setting up The System Environment ................................................................28
2.2.1 Software Requirements...........................................................................28
2.2.2 Trainer Setup...........................................................................................28
2.2.3 Project Folder Setup.......................................................................... 39-30
Bengali Speech Recognition
8
2.2.4 Training the Acoustic Model ..................................................................30
2.2.5 Testing Part .............................................................................................30
2.2.5.1 Testing with Pocket Sphinx ....................................................... 30-31
2.2.5.2 Testing with Sphinx4................................................................. 31-32
Chapter 3 TESTING AND PERFORMANCE EVALUATION ..........(33-38)
3.1 Testing & Performance Evaluation....................................................................34
3.2 Test Results with Pocket Sphinx........................................................................35
3.3 Test Results with Sphinx4 ................................................................................36
3.3.1 Input Type Microphone ..................................................................37
3.3.2 Input Type Audio............................................................................38
Chapter 4: Applications & Developing .......................................................(40-42)
4.1 Review of Some Developed Recognized Application..........................................41
4.1.1 Dictation Application...................................................................................41
4.1.2 Phonetic Translator......................................................................................41
4.1.3 Training File Creator.............................................................................. 41-42
4.1.4 Training File Creator....................................................................................42
Chapter 5: Limitation & Future Work ......................................................(43-44)
5.1 Limitation .............................................................................................................44
5.2 Future Work .........................................................................................................44
Chapter 6: CONCLUSION & REFERENCES ........................................(45-47)
6.1 Conclusion............................................................................................................46
6.2 References.............................................................................................................47
Bengali Speech Recognition
9
List of Figures
List of Charts
Fig No. Name of figures Page
No.
1.3.7 Overview of Speech Recognition System 17
1.4.10 Applying Hidden Markov Model on Speech Recognition. 20
1.5 Overview of the full System Model 21
2.1.2 Audio File Recording Format 24
2.2.5.1 Testing with Pocket Sphinx 31
2.2.5.2 Testing with Sphinx4 32
4.1.2 Dictionary files with phonetic translation. 41
4.1.3.1 Fileids files with phonetic translation. 42
4.1.4.2 Transcription File. 42
Fig No. Name of Charts Page
No.
3.2.2 Experiment Results with Pocket Sphinx 35
3.3.1.2 Experimental Details with Results for Sphinx 4 Live 37
3.3.2.2 Experimental Details with Results for Sphinx 4 Audio 39
Bengali Speech Recognition
10
List of Table
No. of
table
Name of tables Page
No.
1.2 History of Speech Recognition 15
2.2.3 Configuration of Sphinx-train.cfg 29-30
3.2.1 Experimental details with Results for Pocket Sphinx 34
3.3.1.1 Test results with Sphinx4 Input Type: Microphone 36
3.3.2.1 Test results with Sphinx4 Input Type: Audio 38
7.1 Speaker Profiles 48
7.2 Unicode to IPA Chart 49-63
7.3 Corpus About University Admission Information. 64-70
List of Abbreviation & symbols:
ASR Automatic Speech recognition
BSD Berkeley Software Distribution
CMU Carnegie Mellon University
HMM Hidden Markov Model
IPA International Phonetic Alphabet
CMU Principal Component Analysis
ASCII American Standard Code for Information Interchange
MERL Mitsubishi Electric Research Labs
CRBLP Center for Research Bangla Language Processing
D2P Dictionary to pronunciation
SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition
IDE Integrated Development Engine
ABI Allied Business Intelligence
Bengali Speech Recognition
11
ABASTRACT
This report presents an overview of Automatic Speech Recognition (ASR) for our mother
tongue Bangla. It begins with an introduction to speech recognition technology and then it
explains how such systems work and the level of accuracy that can be expected. The object of
human speech is not just a way to convey words from one person to another but also to make the
other person to understand the depth of the spoken words. These systems have made dramatic
performance leaps in the recent past. The aim of this project is to develop software that identifies
human speech with the help of CMU sphinx Speech Recognition API.
Bengali Speech Recognition
12
Literature Survey
Today speech technology plays an important role in many applications. Speech
technology has moved from research to commercial application. Many human machine
interfaces have been invented and applied today in telephone food ordering system, airport
information system, ticketing system, restaurant reservation system, etc. As a result, we have
selected this important field for our project. On the other hand, most of the languages have a
speech recognition system but our mother tongue Bangla has no proper speech recognition
system this is the main reasons to select this topics. At the starting era most of the research
works are done by using Artificial Neural Network (ANN), but as we are using HMM
based technique so some HMM based and related research are mentioned below.
Implementation of Speech Recognition System for Bangla (Shammur Absar
Chowdhury-August 2010). We have studied this thesis report within one week and acquire lot of
knowledge about Speech Recognition. We are really very thankful to Shammur Absar
Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students
who want to work in these fields. [9]
Speech Recognition by Machine: A Review (M.A.Anusuya and S.K.Katti Department
of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore,
India) from this review we have learn lot of things about the types of Speech Recognition,
approaches of speech recognition etc. [1]
Isolated and Continuous Bangla Speech Recognition: Implementation,
Performance and application perspective (by Md. Abul Hasnat, Jabir Mowla and Mumit
Khan- BRAC) – We have studied the past works and to the best of us knowledge this work is the
first reported attempt to recognized Bangla speech using HMM Technique, so from this
publication we have taken most of us suggestion about the steps to build Speech
Recognition System for our report. From here we have learned how to increase the quality
of audio signal given as input by noise elimination process and end detection algorithm,
from this paper we have also learned that how feature of a sound is extracted and what are the
parameters taken in feature files, we have also learn the algorithm for creating HMM models. [8]
Bengali segmented automated speech recognition (Department of Computer Science
and Engineering, BRAC University) from this thesis report we have learn about the Vowel and
Consonants phonemes, Vowels and Consonants phoneme clusters, Voiced and non-voiced stops
and Hidden Markov Model.[6]
Recognition of Spoken Letters in Bangla (Abul Hasanat,Md. Rezaul Karim,Md.
Shahidur Rahman and Md. Zafar Iqbal - SUST), Extraction of Bangla Vowel and Representation
in the Vowel Space (Syed Akhter Hossain-East West, M Lutfar Rahman-Du and Farruk Ahmed-
NSU), Acoustic Analysis of Bangla Consonants(Firoj Alam , S. M. Murtoza Habib and Mumit
Khan) - From here We have learn the technique used to recognize letters, vowels and consonant,
basically here we found out the basic steps towards a recognizer and what are the
common steps to build a full functioning recognizer.[7]
Bengali Speech Recognition
13
Chapter 1
INTRODUCTION
 INTRODUCTION
 HISTORY OF SPEECH RECPGNITION
 TYPES OF SPEECH RECPGNITION
 TERMS AND CONCEPTS
Bengali Speech Recognition
14
1.1 Introduction
Automatic Speech Recognition (ASR) in terms of machinery is the process of converting
an acoustic signal, captured by a microphone or a telephone, to a set of words. It is a broad term
which means it can recognize almost anybody’s speech and also known as automatic speech
recognition or computer speech recognition which means understanding voice by the computer
and performing any required task. On the other hand, Speech Recognition Simply is the
process of converting spoken input to text. Speech recognition is thus sometimes referred to
as speech-to-text. Speech recognition, also referred to as voice recognition, is software
technology that lets the user control computer functions and dictate text by voice. For
example, a person can move the cursor with a voice command, such as “mouse up”. We can
control application functions, such as opening a file menu and we can create a document, such as
letters or reports or start media player by saying “Music”. For this reason many scientists and
researchers are busy with doing works on speech recognition. Most of the languages in the world
have speech recognizers of its own. But our mother tongue Bengali is not enriched with a speech
recognizers. Small research works have been carried on Bengali speech recognizer, but it really
does not have a great outcome. Implementing continuous speech recognizer for Bengali is our
main goal throughout the project work. But developing full blown continious speech recognizer
is a huge task within a short span of time. As a result we have selected a domain based
continuous speech recognizer which includes a conversation on university admission process.
Throughout the whole period of work, we tried to learn about different tools and we chose to use
CMU Sphinx4 as speech recognition API because it’s open source software and it has high
accuracy. There are many high quality and widely used software are available for this work. But
these types of software are so costly and need Berkeley Software Distribution (BSD) license.
The ultimate goal of ASR research is to allow a computer to recognize in real-time, with 100%
accuracy, all words that are intelligibly spoken by any person, independent of vocabulary size,
noise, speaker characteristics or accent. [9]
1.2 History of Speech Recognition
While AT&T Bell Laboratories developed a primitive device that could recognize speech
in the 1940s, researchers knew that the widespread use of speech recognition would depend on
the ability to accurately and consistently perceive subtle and complex verbal input. Thus, in the
1960s, researchers turned their focus towards a series of smaller goals that would aid in
developing the larger speech recognition system. As a first step, developers created a device that
would use discrete speech, verbal stimuli punctuated by small pauses. However, in the 1970s,
continuous speech recognition, which does not require the user to pause between words, began.
This technology became functional during the 1980s and is still being developed and refined
today. Speech Recognition Systems have become so advanced and mainstream that business and
health care professionals are turning to speech recognition solutions for everything from
providing telephone support to writing medical reports. Technological advances have made
Bengali Speech Recognition
15
speech recognition software and devices more functional and user friendly, with most
contemporary products performing tasks with over 90 percent accuracy.
According to the figure provided by industry, satisfying the needs of consumers and
businesses by simplifying customer interaction, increasing efficiency, and reducing operating
costs, speech recognition is used in a wide range of applications. Furthermore, Allied Business
Intelligence (ABI), the increased popularity of speech recognition will push revenues from $677
million in 2002 to an estimated $5.3 Billion by 2008. Indeed, recent advances in speech
recognition software are creating a dynamic environment, since this technology appeals to
anyone who needs or wants a hands-free approach to computing tasks. As the merger of large
vocabularies and continuous recognition continues, look for more and more companies to move
toward speech recognition and watch the industry take its place as a leader in the technology
sector. [1]
1936 AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder.
1970 HMM approach to speech & voice recognition was invented by Lenny Baum of
Princeton University.
1971 DARPA established.
1982 Dragon Systems was founded.
1984 Speech Works, the leading provider of over-the-telephone automated speech
recognition (ASR) solutions, was founded.
1995 Dragon released discrete word dictation-level speech recognition software. It was the
first time dictation speech & voice recognition technology was available to consumers.
1997 Dragon introduced "Naturally Speaking", the first "continuous speech" dictation
software available.
1998 Microsoft invested $45 million to allow Microsoft to use speech & voice recognition
technology in their systems.
2000 Lernout & Hauspie acquired Dragon Systems for approximately $460 million.
2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical, Lowers Healthcare Costs
through Highly Accurate Speech Recognition.
Table 1.2: History of Speech Recognition
1.3 Types of Speech Recognition
Speech recognition systems can be separated in different classes by describing
what types of utterances they have the ability to recognize. These classes are classified as
the following: [1]
1.3.1 Isolated Words: Isolated word recognizers usually require each utterance to
have quiet (lack of an audio signal) on both sides of the sample window. It accepts single
words or single utterance at a time. These systems have "Listen/Not-Listen" states, where they
require the speaker to wait between utterances (usually doing processing during the pauses).
Bengali Speech Recognition
16
Isolated Utterance might be a better name for this class. Simply Isolated Words are the single
words such as me, You, Go etc.
1.3.2 Connected Words: Connected word systems (or more correctly 'connected
utterances') are similar to isolated words, but allows separate utterances to be 'run-together'
with a minimal pause between them. Such as- I eat rice.
1.3.3 Continuous Speech: Continuous speech recognizers allow users to speak almost
naturally, while the computer determines the content. (Basically, it's computer
dictation). Recognizers with continuous speech capabilities are some of the most
difficult to create because they utilize special methods to determine utterance boundaries.
1.3.4 Spontaneous Speech: At a basic level, it can be thought of as speech that is natural
sounding and not rehearsed. An ASR system with spontaneous speech ability should be able
to handle a variety of natural speech features such as words being run together, "ums" and
"ahs", and even slight stutters.
Based on speaker there are two type of speech recognition. Those are
1. Speaker–dependent
2. Speaker–independent
1.3.5 Speaker–dependent: Speech recognition systems that require a user to train the
system to his/her voice are known as speaker-dependent systems. If you are familiar with
desktop dictation systems, most are speaker dependent like IBM via Voice. Because they
operate on very large vocabularies, dictation systems perform much better when the
speaker has spent the time to train the system to his/her voice. Speaker–dependent software
is commonly used for dictation. It works by learning the unique characteristics of a single
person's voice, in a way similar to voice recognition. New users must first "train" the software by
speaking to it, so the computer can analyze how the person talks. This often means users have to
read a few pages of text to the computer before they can use the speech recognition software.
1.3.6 Speaker–independent: Speech recognition systems that do not require a user to
train the system are known as speaker-independent systems. Speech recognition in the Voice
XML word must be speaker-independent. Speaker–independent software is more commonly
found in telephone applications. It is designed to recognize anyone's voice, so no training is
involved. This means it is the only real option for applications such as interactive voice response
systems where businesses can't ask callers to read pages of text before using the system. The
downside is that speaker–independent software is generally less accurate than speaker–dependent
software.
Bengali Speech Recognition
17
1.3.7 Overview of Speech Recognition System
Fig 1.3.7: Overview of Speech Recognition System
1.4 Terms and Concepts
Following are the some basic terms and concepts that are fundamental to speech
recognition. It is important to have a good understanding of these concepts. [9][10]
1.4.1 Utterances
An utterance is something you say. It can be one word or it can be a series of words. For
example, “Word”, “Microsoft Word,” or “I’d like to run Microsoft Word” are all examples of
possible utterances. On the other hands, an utterance is any stream of speech between two
periods of silence. Utterances are sent to the speech engine to be processed. Silence, in speech
recognition, is almost as important as what is spoken, because silence delineates the start and end
of an utterance. The speech recognition engine is "listening" for speech input. When the engine
detects audio input in other words, a lack of silence the beginning of an utterance is signaled.
Similarly, when the engine detects a certain amount of silence following the audio, the end of the
utterance occurs.
1.4.2 Pronunciation
You have heard the word pronunciation when it pertains to learning any language. What is
pronunciation and what are some of the fundamental aspects of this important part of learning
English. In any language pronunciation pertains to the sounds that are produced to make
meaning. There are aspects of speech that go beyond that individual sound that makes the
language unique: phrasing, stress, intonation, timing, and rhythm. Your voice is then projected to
communicate what you want to say. Add to that cultural nuances, gestures and local expressions
and you speak that immediately tells something about yourself to the people around you. When
you are just learning a new language, it would be easy to avoid speaking in public, but that is not
the best choice because you do not want to experience social isolation. It does not seem fair but
people can be judged by the way they speak and can be seen as uneducated, incompetent or lack
knowledge. All because the listener is reacting to the pronunciation and not what you are trying
to communicate. The speech recognition engine uses all sorts of data, statistical models, and
algorithms to convert spoken input into text. One piece of information that the speech
Bengali Speech Recognition
18
recognition engine uses to process a word is its pronunciation, which represents what the speech
engine thinks a word should sound like. Words can have multiple pronunciations associated with
them. For example, the word “the” has at least two pronunciations in the U.S. English language:
“thee” and “thuh”.
1.4.3 Grammars
Grammars define the domain, or context, within which the recognition engine works. The
engine compares the current utterance against the words and phrases in the active
grammars. If the user says something that is not in the grammar, the speech engine will not be
able to understand it correctly. So usually speech engines have a very vast grammar.
Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the
Speech Recognition system. Generally, smaller vocabularies are easier for a computer to
recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each
entry doesn't have to be a single word.
1.4.4 Vocabularies
Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR
system. Generally, smaller vocabularies are easier for a computer to recognize, while larger
vocabularies are more difficult. Unlike normal dictionaries, each entry doesn't have to be a single
word. They can be as long as a sentence or two. Smaller vocabularies can have as few as 1 or 2
recognized utterances (e.g. “Wake Up"), while very large vocabularies can have a hundred
thousand or more.
1.4.5 Training
Some speech recognizers have the ability to adapt to a speaker. When the system has this ability,
it may allow training to take place. An ASR (Automatic Speech Recognition) system is trained
by having the speaker repeat standard or common phrases and adjusting its comparison
algorithms to match that particular speaker. Training a recognizer usually improves its accuracy.
Training can also be used by speakers that have difficulty speaking, or pronouncing
certain words. As long as the speaker can consistently repeat an utterance, ASR systems with
training should be able to adapt.
1.4.6 Accuracy
The ability of a recognizer can be examined by measuring its accuracy − or how well it
recognizes utterances. The performance of a speech recognition system is measurable. Perhaps
the most widely used measurement is accuracy. It is typically a quantitative measurement and
can be calculated in several ways. This measurement is useful in validating application design.
For example, if the user said "yes," the engine returned "yes," and the "YES" action was
executed, it is clear that the desired result was achieved. But what happens if the engine
returns text that does not exactly match the utterance? For example, what if the user
said "nope," the engine returned "no," yet the "NO" action was executed? Should that be
considered a successful dialog? The answer to that question is yes because the desired result was
achieved.
1.4.7 A Language Dictionary
Accepted Words in the Language are mapped to sequences of sound units representing
pronunciation, sometimes includes syllabification and stress.
Bengali Speech Recognition
19
1.4.8 A Filler Dictionary
Non-Speech sounds are mapped to corresponding non-speech or speech like sound units.
1.4.9 Phone
Way of representing the pronunciation of words in terms of sound units. The standard system for
representing phones is the International Phonetic Alphabet or IPA. English Language use
transcription system that uses ASCII letters whereas Bangla uses Unicode letters
1.4.10 HMM
Hidden Markov Models can be seem as finite state machines where for each sequence unit
observation there is a state transition and, for each state, there is a output symbol
emission. Transitions among the states are governed by a set of probabilities called transition
probabilities. In a particular state an outcome or observation can be generated, according to the
associated probability distribution. It is only the outcome, not the state visible to an external
observer and therefore states are ``hidden'' to the outside; hence the name Hidden Markov
Model.
On the other hand, Hidden Markov Model (HMM) is a statistical model in which the system
being modeled assumed to be a Markov process with unknown parameters, and the challenge is
to determine the hidden parameters from an observation parameters. In speech recognition
process, after our voice is recorded, it will be divided into many frames that we need to process
in order to generate the sentence in text form. Each frame is represented as state, group of some
states is represented as phoneme, and group of some phonemes is represented as word that we
need to recognize. In database known as linguist model, we store the reference value of state,
phoneme, and word in order to compare with the observed data (voice).By applying HMM; we
construct a statistical model on each phone that its states are assigned specific possibilities in
comparison with reference value. The possibility of each state depends on itself and the previous
one. The goal of speech recognition system is to find out the sequence of states that has the
maximum probability. Because the HMM theory is very complicated, so we don’t go very detail
about that. If you want to learn more, you can see at the Appendix A
Bengali Speech Recognition
20
Fig 1.4.10: Applying Hidden Markov Model on Speech Recognition.
1.4.11 Language Model
The language model describes the likelihood, probability, or penalty taken when a sequence or
collection of words is seen. A language model is used to restrict word search. It defines which
word could follow previously recognized words and helps to significantly restrict the matching
process by stripping words that are not probable. Most common language models used are n-
gram language models-these contain statistics of word sequences-and finite state language
models-these define speech sequences by finite state automation, sometimes with weights.
Bengali Speech Recognition
21
1.5 Overview of the Full System
Figure 1.5 Overview of the Full System Model
Bengali Speech Recognition
22
Chapter 2
METHODOLOGY
 Data Preparation
 Setting up the System Environment
Bengali Speech Recognition
23
2.1 Data Preparation
We have to make some important files that are required for training and also for testing.
We have already mentioned that our project is about domain based recognition application.
Domain based means a particular topic containing small amount of data. We have selected fifty
sentences for our recognition application and files in below are created based on these data. The
required files are
 Corpus
 Audio files
 Dictionary file
 Phone file
 Language Model file .lm format
 Language Model file .DMP format
 Transcription file
 Fileids file
 Filler file
2.1.1 Corpus
The Corpus is just a list of sentences that use to train the language model and simply we can tell
Corpus is the collection of sentences those we are want to recognize in our machine. For our
project we have also collected some important sentences according to our domain. Some
sentences of our project are following…
….
2.1.2 Audio files
After collecting the corpus next step is to collect the audio file of this corpus with the (.wav) or
(.sph) format. During recording session the following parameters of the wave file has been
maintained throughout:
• Sampling rate of the audio: 16 kHz
• Bit rate (bits per sample): 16
• Channel: mono (single channel)
Bengali Speech Recognition
24
Fig 2.1.2: Audio File Recording Format
For this work 16 kHz sample rate has been chosen because it provides more accurate high
frequency information and 16 bit per sample will divides the element position in to 65536
possible values. After the recording, the splitting of the audio files per sentence has been done
manually using recording software for our project we are using WavePad sound editor and sound
file in a .wav format, where each wav file has been named by using speaker id and sentence id.
For example: An audio file of our project is
01_01.wav stands as
Speaker Id: 01 Sentence Id: 01
When we have collected the audio from a speaker, we have saved the personal information of
this speaker like:
 Name
 Age
 Gender
 Audio collected environment
Some other information like
 Environmental condition of recording (for example: class room condition, number of
students present, sources of noise like fan, generator’s sound etc.)
 Technical details of device (pc, microphone)
 Date and time of recording has also been noted down.
2.1.3 Dictionary file
Simply dictionary file is the list of words which we get from our corpus file and then we need to
find the pronunciation of those words such as -AA P NA KE. For this work we need
software which gives us dictionary file to pronunciation file. Also a software grapheme to
phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA
system but we need ASCII format. As a result for our project we have developed a software D2P
(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file. Our
Bengali Speech Recognition
25
dictionary file contains 128 words. The format of dictionary file will be (.dic). The name of our
dictionary file is sbsbsr.dic and some contents of dictionary file for our project is:
AA CH E
AA P N AA K E
AA P N I
AA M I
I CCHA U K
….
Note:
 All phonemes are in capital letter such as = AA P N I
 File format is (.dic)
 File encoding is utf_8 without BOM
 Word can not be repeated
 A blank line is required in the end of file (i.e. an extra line)
2.1.4 Phone file
Phone file is the list of phoneme within words such as “AA P N I”. Here is 4 phonemes and it is
a simple text file that tells a trainer what phonemes are part of the training set. The file has one
phone in each line, no duplicity is allowed. This file can be generated using a small program
written for this project which takes the *.dic file as input and gives *.phone file as output.
For our project the file name is sbsbsr.phone. Some contents of phone file for our project is
A
AA
B
BH
C
CCHA
CH
….
Note:
 All phones are in capital letter.
 File format is .phone
 File encoding is utf_8 without BOM
 Word can not be repeated
 A blank line is required in the end of file (i.e. an extra line)
 Silence phoneme “SIL” also included in phone file
 All phoneme in dic file are present in phone file without repetition
Bengali Speech Recognition
26
2.1.5 Language Model file .lm format
A language model assigns a probability to a piece of unseen text, based on some training data.
The language model file is plain text. The format is the commonly used "arpa" format which is
standard in speech recognition research. It lists 1-, 2- and 3-grams along with their likelihood
(the first field) and a back-off factor (the third field). To build this file, CMU Imtoolkit is used.
Imtool is a web based tool that allows users to quickly compile text-based components needed
for using an ASR decoder. To do this, a corpus is needed, which in this case means a set
of sentences (or more precisely, utterances) that is expected for recognition system to be able
to handle.The corpus needs to be in the form of an ASCII text file but with new advanced
version Unicode text file is also supported, with one sentence to a line. Upload this file, click the
compile button. This will give a set of lexical (pronunciation dictionary) and language
modeling files. Here the only file used is LM file as Pronunciation dictionary should be built as
stated above. The tool is best for small domains. For our project the file name is sbsbsr.lm and
file format is .lm. Some contents of language model file for our project is
-2.0719 -0.2626
-2.0719 -0.2861
-2.0719 -0.2973
-1.7709 -0.2936
-2.0719 -0.2626
-1.7709 এ -0.2861
-2.0719 -0.2626
-2.0719 ও -0.2973
….
2.1.6 Language Model file .DMP format
We also need Language Model file .DMP format for training in sphinx4. We are using Linux
environment for getting the Language Model file .DMP format from the Language Model file
.lm format. We have used following commands in Linux terminal for getting the Language
Model file with .DMP format:
sphinx_lm_convert -i model.lm -o model.DMP
Here model.lm is the name of language model file with .lm format and model.dmp is the name of
language model file with .DMP format. For our project it is
sphinx_lm_convert -i sbsbsr.lm -o sbsbsr.DMP
Bengali Speech Recognition
27
2.1.7 Transcription file
A transcript is needed to represent what the speakers are saying in the audio file. So in a file the
dialogue of the speaker noted exactly the same precise way it has been recorded, with
silence tag (starting tag <s> ending tag </s>) followed by the file ID which represent the
utterance. This file is known as transcription file and basically there are two types of
.transcription file. One of them is used to train the system and another is to testing. Named the
files using the project name, here the name of the project is “sbsbsr”, so the train file name is
sbsbsr_train.transcription and test file name is sbsbsr_test.transcription. Some contents of
transcription file for our project is:
<s> </s> (01_01)
<s> </s> (01_02)
<s> </s> (01_03)
<s> </s> (01_04)
….
Note:
 File format is .transcription
 File encoding is utf_8 without BOM
 Sentence can not be repeated
 A blank line is required in the end of file (i.e. an extra line)
2.1.8 Fileids file
The Fileids files contain the name of all audio file without .wav or .sph extension. Two types of
Fileids file, one for training and other for testing. The name of training file for our project is
sbsbsr_train.fileids and the name of testing file for our project is sbsbsr_test.fileids.
For Example:
sbsbsr_train/sanjoy/sanjoy1/01_01
sbsbsr_ train /sanjoy/sanjoy1/01_02
sbsbsr_ train /sanjoy/sanjoy1/01_03
…….
2.1.9 Filler file
Filler file contains user’s definition of any background noise emerging in recording
database and it is dictionary where a non-speech sounds are mapped to corresponding non
speech sound units. This file is named as sbsbsr.filler for our project.
For Example:
Bengali Speech Recognition
28
<s> SIL
<sil> SIL
</s> SIL
Note that the words <s>, </s> and <sil> are treated as special words and are required to be
present in the filler dictionary. At least one of these must be mapped on to a phone called "SIL".
 <s> symbolizes “beginning of speech”
 </s> symbolizes “end of speech”
 <sil> symbolizes “silence in speech”
2.2 Setting up the System Environment
2.2.1 Software Requirements
We did the training part in Linux operating system. For training the recognition engine from
CMU sphinx we need two software − sphinx base and sphinx train. We have collected it from
CMU sphinx web site. For installing these twos oftware first we need to install some dependence
software in Ubuntu distribution of Linux such as Perl and C compiler (gcc). [14]
We installed these two softwares by the following commands in Linux terminal:
Perl
sudo apt-get install perl
GCC
sudo apt-get install gcc
2.2.2 Trainer Setup
We already know that for setting up the trainer we need two software sphinx base and
sphinx train. After downloading the software we have been decompressed it in a folder in Linux,
it can be any folder. We did it in Linux root folder. After decompressing these software’s we can
install them by the following commands in terminal [14]:
Sphinxbase
cd sphinxbase
sudo ./configure
sudo make
sudo make install
Sphinxtrain
cd Sphinxtrain
sudo ./configure
Bengali Speech Recognition
29
sudo make
sudo make install
2.2.3 Project Folder Setup
We have created the system environment for training. We have created a project folder
where sphinx train will create the trained files or acoustic model. First we need to enter to the
root directory where our installed sphinx base and sphinx train folder are placed. Here we have
created a folder. We gave the folder name is “sbsbsr”. After creating the folder we need to open
terminal and go to created folder sbsbsr from terminal. Then we have created a project task for
sphinx train by the following command in terminal:
../sphinxtrain/scripts_pl/setup_SphinxTrain.pl -task sbsbsr
Executing this command from terminal will create various folder in sbsbsr such as “etc”,
“wav”, “model parameters” etc.
Now the time to copy files those we have created in data preparation part. We have to
copy dic, filler, phone, transcript, fileids, lm, lm.dmp in “etc” folder and our collected audio into
“wav” folder. Now we need to change some parameters for training in sphinx_train.cfg file
created automatically in “etc” folder when creating the project task. We have changed some
parameters written below in sphinx_train.cfg file:
Parameter Value Before Value After
$CFG_WAVFILE_EXTENSION sph wav
$CFG_WAVFILE_TYPE nist/mswav/raw raw
$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd
$CFG_FINAL_NUM_DENSITIES 8 1
$CFG_STATESPERHMM 6 3
$CFG_N_TIED_STATES 100 100
Table 2.2.3: Configuration of Sphinx-train.cfg
Bengali Speech Recognition
30
By these changes we have finished the project folder setup.
2.2.4 Training the Acoustic Model
In training process first we have to convert our collected raw speech audio data into mfc files.
That’s why again we opened project directory in Linux terminal and ran the feature extraction
command in terminal. Command we executed for this task is [14]:
perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_train.fileids
Executing this command made all *.wav files into *.mfc files in “feat” directory under project
folder “sbsbsr”.
Now we are ready to execute the main training command in Linux terminal that will create the
acoustic model. For this task still we have to stay in project directory in terminal and execute the
following command in terminal:
perl scripts_pl/RunAll.pl
By executing this command will train the acoustic model and we will find the trained acoustic
model. The model files are placed in “model_parameters/” directory under project folder
“sbsbsr”.
2.2.5 Testing part
We tested our model by two recognizers from CMU sphinx. They are Pocketsphinx and Sphinx4.
Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable, modifiable recognizer.
2.2.5.1Testing with Pocketsphinx
First we have to download this tool from CMU Sphinx site. After downloading this tool we will
go to the root directory where we have installed sphinxbase and sphinxtrain. Here we extract the
downloaded pocket sphinx file. After extracting, we install this software by these following
commands in Linux terminal:
./configure
make
After installing the software we have to go into our project folder and execute a command from
terminal to make folder structure for training. For this task the command is
../pocketsphinx/scripts/setup_sphinx.pl -task sbsbsr
Then we put some testing audio in “wav” directory, because pocketsphinx recognize with input
as audio file. Also we have to copy test fileids and transcript file in “etc” folder. For decoding or
testing our model from audio with pocket sphinx, first we need feature files of audio files. We
can make this by the following command in Linux terminal:
Bengali Speech Recognition
31
perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_test.fileids
After that we execute the main command for decoding or testing in Linux terminal
perl scripts_pl/decode/slave.pl
Executing this command will decode the corresponding speech of input audio with the help of
our trained acoustic model. We can find the result of testing in “result” folder under project
folder. For our fifty sentences trained model the result is below
Figure 2.2.5.1: Testing with Pocketsphinx
2.2.5.2 Testing with Sphinx4
Sphinx4 is an adjustable, modifiable recognizer written in Java. We use this sphinx4 java library
to test our trained model in windows 7 operating system. We need two softwares to test our
model with sphinx4 decoder. They are sphinx4 and eclipse IDE. After installing the eclipse IDE
in windows we have to download sphinx4 from CMU sphinx site. After downloading sphinx4
we extract the zip file in any place in windows. Then we have to create a new java project in
eclipse and make a java file with the help of demo application from CMU sphinx. After that we
need to add the files shown in below from our previous project in eclipse project [11]:
“sbsbsr.cd_cont_100” folder from sbsbsr/model_parmeters/
*.dic
*.lm.DMP
*.filler
Bengali Speech Recognition
32
We create a cofig.xml file in eclipse project to tell the configuration to recognizer and say where
the required model files are placed. We can create this cofig.xml with the help of config file in
sphinx4 demo application. We need to add four java jar files js.jar, jsapi.jar, sphinx4.jar, tags.jar
from sphinx4/lib directory to our project. Now our java project is ready to build and run. After
building our project we run the project and can test with live voice input from microphone. For
our ten sentences trained model result is
Figure 2.2.5.2: Testing with Sphinx4
We can build various applications with the help of sphinx4 by using java language. We build
some application using sphinx4 that will be discussed later.
Chapter 3
TESTING &
PERFORMANCE
EVALUATION
Bengali Speech Recognition
33
3.1 Testing and Performance Evaluation
We tried to test our model in various environments such as open room, closed room,
university lab room, common room etc. We have completed our testing using audio inputs of six
test speaker .For the live testing we are using microphone in different environments. [9]We are
completed our test using two different kinds of decoder those are:
1. Pocket Sphinx
2. Sphinx4
3.2 Test Results with Pocket Sphinx:
Experiment No Details Results
Bengali Speech Recognition
34
Experiment 01 Using Trained Data Set
Number of Speaker: 5
Male: 4
Female: 1
Total Words: 1025
Correct: 975
Errors: 98
Total Percent correct = 95.12%
Error = 9.56%
Accuracy = 90.44%
Experiment 02 Using Trained Data Set
Number of Speaker: 3
Male: 3
Female: 0
Total Words: 615
Correct: 591
Errors: 39
Total Percent correct = 96.10%
Error = 6.34%
Accuracy = 93.66%
Experiment 03 Using Trained Data Set
Number of Speaker: 3
Male: 3
Female: 0
Total Words: 615
Correct: 601
Errors: 22
Total Percent correct = 97.72%
Error = 3.58%
Accuracy = 96.42%
Experiment 04 Using Trained Data Set
Number of Speaker: 5
Male: 2
Female: 3
Total Words: 1025
Correct: 938
Errors: 184
Total Percent correct = 91.51%
Error = 17.95%
Accuracy = 82.05%
Experiment 05 Using Trained Data Set
Number of Speaker: 3
Male: 0
Female: 3
Total Words: 615
Correct: 545
Errors: 125
Total Percent correct = 88.62%
Error = 20.33%
Accuracy = 79.67%
Table 3.2.1 Experimental details with Results for Pocket Sphinx
Bengali Speech Recognition
35
Chart 3.2.2 Experiment Results with Pocket Sphinx
Average Accuracy = 88.44%
Bengali Speech Recognition
36
3.3 Test Results with Sphinx4:
3.3.1 Input Type: Microphone
Experiment No Details Results
Experiment 01 User Type: Trained
Number of Speaker: 2
Environment: Closed Room
Speaker Type: Male
Input Device: Microphone
Number of Words: 156
Correct Words:154
Errors: 2
Percent of Correct: 98%
Errors: 2%
Accuracy: 98%
Experiment 02 User Type: Untrained
Number of Speaker: 3
Environment: Lab Room
Speaker Type: Male
Input Device: Microphone
Number of Words: 130
Correct Words: 120
Errors:10
Percent of Correct: 90%
Errors: 10%
Accuracy: 90%
Experiment 03 User Type: Untrained
Number of Speaker: 3
Environment: University Campus
Speaker Type: Male
Input Device: Microphone
Number of Words: 140
Correct Words: 122
Errors:18
Percent of Correct: 82%
Errors: 18%
Accuracy: 82%
Experiment 04 User Type: Untrained
Number of Speaker: 2
Environment: University Campus
Speaker Type: Female
Input Device: Microphone
Number of Words: 108
Correct Words: 95
Errors:13
Percent of Correct: 87%
Errors: 13%
Accuracy: 87%
Experiment 05 User Type: Trained
Number of Speaker: 3
Environment: University floor
Speaker Type: Female
Input Device: Microphone
Number of Words: 126
Correct Words: 115
Errors:11
Percent of Correct: 89%
Errors: 11%
Accuracy: 89%
Experiment 06 User Type: Trained
Number of Speaker: 2
Environment: Closed Room
Speaker Type: Male
Input Device: Microphone
Number of Words: 128
Correct Words: 127
Errors:1
Percent of Correct: 99%
Errors: 1%
Accuracy: 99%
Table 3.3.1.1 Experimental Details with Results for Sphinx 4 Live
Bengali Speech Recognition
37
Chart 3.3.1.2 Experiment Results with Sphinx-4 Live Input
Average Accuracy = 90.83%
Bengali Speech Recognition
38
3.3.2 Input Type: Audio
Experiment No Details Results
Experiment 01 Using Trained Data Set
Number of Speaker: 3
Male: 3
Female:0
Total Words: 210
Correct: 194
Errors: 16
Total Percent correct = 85.71%
Error = 15.71%
Accuracy = 85.71%
Experiment 02 Using Trained Data Set
Number of Speaker: 3
Male: 2
Female:1
Total Words: 210
Correct: 188
Errors: 22
Total Percent correct = 77.14%
Error = 22%
Accuracy = 77.14%
Experiment 03 Using Trained Data Set
Number of Speaker: 3
Male: 2
Female:1
Total Words: 210
Correct: 182
Errors: 28
Total Percent correct = 72.38%
Error = 28%
Accuracy = 72.38%
Experiment 04 Using Trained Data Set
Number of Speaker: 3
Male: 2
Female:1
Total Words: 210
Correct: 194
Errors: 16
Total Percent correct = 84.44%
Error = 16%
Accuracy = 84.44%
Experiment 05 Using Trained Data Set
Number of Speaker: 3
Male: 1
Female:2
Total Words: 210
Correct: 184
Errors: 26
Total Percent correct = 74.76%
Error = 26%
Accuracy = 74.76%
Table 3.3.2.1 Experimental Details with Results for Sphinx 4 Audio
Bengali Speech Recognition
39
Chart 3.3.2.2 Experiment Results with Sphinx-4 Audio Input
Average Accuracy = 78.88%
Bengali Speech Recognition
40
Chapter 4
APPLICATION &
DEVELOPING
 Reveiw of Developed Application
Bengali Speech Recognition
41
4.1 Review of Some Developed Recognition Application
We developed four applications.They are dictation applications, phonetic translator,
training file creator and desktop command type application.
4.1.1 Dictation Application
We write this application with the help sphinx4 demo application. The main objective of this
application is to recognize sentences. Actually this is our main objective in this project.
4.1.2 Phonetic Translator
For training the acoustic model we need a file called "*.dic". In this file all training words and
their pronunciation are placed. We have made these pronunciations several times when training
acoustic model experimentally. As time goes on we think about an automatic pronunciation or
phonetic translation maker and this software is the implementation of that thinking. First we
made a database where all phonemes and their corresponding letters are stored. We took help
from IPA chart and various thesis papers [1] [9] [10] to make this database. However, various
sources define phonemes in different ways. Even all letters’ phoneme is not defined. That’s why
we personally define some phonemes for some consonant and conjunct letters. After making this
database we made phonetic translator to give phonetic translation of Bengali words with the help
of our created database.
Fig 4.1.2: Dictionary files with phonetic translation.
4.1.3 Training File Creator
For training the acoustic model we also need fileids and transcript file. These two files contain
information about training audio file paths and their corresponding sentences. Before creating
Bengali Speech Recognition
42
this program we have to make these two files manually. But after creating our software we can
make these two big files automatically within moments. For example, if we have 8000
Fig 4.1.3.1: Fileids files with phonetic translation.
audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and
their’s corresponding sentences in transcript file. But now we can create these two files
automatically if we provide root folder name of audio file and sentence corpus file to this
software as input.
Fig 4.1.3.2: Transcription File.
4.1.4 Command Application
We make a simple voice command application. By using Bengali word as voice command this
application do some common task such as opening my computer, left click, right click etc.
Bengali Speech Recognition
43
Chapter 5
LIMITATION &
FUTURE WORK
 LINITATION
 FUTUR WORK
Bengali Speech Recognition
44
Limitation
In our project we have some limitation in some specific tasks. The system of our Project
has been built on small data for time consistency. We have selected a domain about University
admission information for new comer students with 185 sentences. But it was difficult to collect
lots of audio from 16 speakers with a short span of time. As a result, we have selected 50
sentences from 185 sentences for training. But more speakers are needed for getting more
accurate results. For creating dictionary file we have also faced some problems.Because our
Bengali phoneme list is not declared accurately and we don’t know exact number of phonemes in
our Bengali language as different researchers said about different number of phonemes. The
performance of system depends on speaker pronunciation, environment and microphone. It
recognizes the sentences accurately when speaker speak the sentences loudly and clearly and
sometimes it cannot recognize the sentences accurately because of slowly speaking and
pronunciation problem. We created a program for automatically generated pronunciation of a
Bengali word. But this software is not working properly because of encoding problem. As
accurate phoneme is a prerequisite for good pronunciation, that’s why if we have accurate
number of phonemes then we can hope a good output from this software.
Future Works
We have done implementation of Bengali Speech Recognition for small data size. In
future we will increase our data size for creating a complete model and we have a plan to
increase its capability to recognize speech more accurately and enhance its vocabulary. We also
have developed software for making dictionary file, fileids file and transcription file. We want to
make a user friendly stand-alone GUI application for writing Bengali language. We also have an
intention to develop a complete desktop command type application for Bengali. Still training and
creating a model depend on developer that’s why we have a plan to make an automatic trainer
that can be used by a normal user. By using this automatic trainer, user will be able to train any
sentence with the corresponding audio. We also want to integrate this system to various
document type applications for writing Bengali sentence by just uttering the sentence. We want
to make voice respond type application. It will work like a user asking the software to give
answer of his question and software will give the predefined answer of this question. For making
a good recognition application we need a lot of audio. So we want to develop a website for
collecting audio from people. From this website we will be able to collect audio and using this
audio we will enrich our recognition application.
Bengali Speech Recognition
45
Chapter 6
C ONCLUSION &
REFERENCES
 CONCLUSION
 REFERENCES
Bengali Speech Recognition
46
Conclusion
Speech is the primary and the most convenient means of communication between people.
Lot of research in the field of ASR is being carried out for English, Hindi, Urdu, Arabic,
Japanese languages and so on. But in our mother tongue Bengali is still in beginner level in this
field. So we tried to learn about this field and to develop some tools to recognize Bengali
language. We tried to discuss about our objectives, various tools we used and process of speech
recognition through this whole report. But our developed tools are in preliminary level. For
making good and complete recognition application, lots of improvement required such as we
need a big training database, lots of speakers, audio with low noise etc. Still no speech
recognizer is 100% accurate. But if we can improve the requirements of a good recognizer and
can train our system more accurately then the result of the system will be enough to achieve our
goal.
Bengali Speech Recognition
47
References
[1]. M.A.Anusuya & S.K.Katti, “Speech Recognition by Machine: A Review” (IJCSIS)
International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009.
[2]. Morched Derbali, MU Tasem Jarrah, Mohd Taib Wahid “A Review of Speech Recognition
with Sphinx Engine in Language Detection” Journal of Theoretical and Applied Information
Technology, Vol. 40 No.2, 2005 - 2012.
[3]. L. Rabiner & B. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993.
[4]. Daniel Jurafsky and James H.Martin, “An Introduction to Natural Language Processing,
Computational Linguistics and Speech Recognition”, Prentice Hall, 2000.
[5]. L.R. Rabiner and R.W. Schafer, “Digital Processing of Speech Signal”, Prentice Hall, 1978.
[6]. A K M Mahmudul Hoque, “Bengali Segmented Automatic Speech Recognition”,
BRACU, 2006.
[7]. Abul Hasanat Md. Rezaul Karim, Md. Shahidur Rahman and Md. Zafar Iqbal, “Recognition
of Spoken Letters in Bangla”, banglacomputing.net, 2002.
[8]. Md. AbulHasnat, Jabir Mowla, Mumit Khan, “Isolated and Continuous Bangla Speech
Recognition: Implementation, Performance and application perspective”, BRACU, 2007.
[9]. Shammur Absar Chowdhury, “Implementation of Speech Recognition System for Bangla”,
BRACU, August 2010.
[10]. Qqbal S/O Shahzad, “Speech Recognition System”, Iqra University, March 2009.
[11]. Tran Viet Khai,”Sphinx4 Adaptation to Vietnamese Language, Vietnamese Automatic
Digit Recognition”, Bo Xuan Tu, Hochiminh city,Vietnam, 2008.
[12]. Sadaoki Furui, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous
Speech”, IEEE, 2004.
[13]. M. S. Islam, “Research on Bangla Language Processing in Bangladesh: Progress and
Challenges”, BUET, 2009.
[14]. P. Foster, T. Schalk, “Speech Recognition: The Complete Practical Reference Guide”,
1993. ISBN: 0936648392.
[15]. H. Satori, M. Harti and N. Chenfour, “Introduction to Arabic Speech Recognition Using
CMUS Sphinx System”, 2007.
Bengali Speech Recognition
48
Fig 7.1: Speaker Profiles
Speaker
ID
Name Age Gender District Environment Institution/Other
02 Bappy 23 Male Sylhet Closed Room Leading University
03 Bijoy 23 Male Moulovibazar Closed Room M.C. College
04 Dola 24 Female Chittagong Department Leading University
05 Falguny 24 Female Sylhet Open Space Leading University
06 Lovely 20 Female Sylhet Class Room Lotifa Shofi
Chowdhury Mohila
College
07 Mazed 23 Male Moulovibazar Closed Room Leading University
08 Moni 23 Female Sylhet Lab Leading University
09 Pinku 23 Male Sylhet Closed Room Sylhet Govt.
College
10 Polash 24 Male Feni Cafeteria Leading University
11 Pritom 20 Male Sylhet Closed Room Leading University
12 Razib 23 Male Sylhet Cafeteria Leading University
13 Rumi 22 Female Sylhet Lab Leading University
14 Sanjoy 23 Male Sylhet Closed Room Leading University
15 Shimul 23 Male Sylhet Closed Room Leading Univerity
16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon
College
APENDICES
Speaker Profile
Bengali Speech Recognition
49
Unicode to IPA Chart
Bangla Pnoneme (ব্যজ্ঞনবর্ণ) IPA
ক K
খ KH
গ G
ঘ GH
ঙ NG
চ C
ছ CH
জ J
ঝ JH
ঞ NIO
ট T
ঠ TH
ড D
ঢ DH
ণ N
ত TA
থ TO
দ DA
ধ DO
ন N
প P
ফ PH
Bengali Speech Recognition
50
ব B
ভ BH
ম M
য Z
র R
ল L
শ SH
ষ SH
স S
হ H
ড় RA
ঢ় RH
য় Y
ৎ T``
NG
:
^
Bangla Pnoneme IPA
AA
I
II
U
UU
RI
Bengali Speech Recognition
51
E
OI
O
OU
Bangla Pnoneme ( ) IPA
ব- W
য- Y
র- R
ম- M
RR
Bangla Pnoneme (নাম্বার) IPA
শুন্য 0
এক 1
দুই 2
তিন 3
চার 4
পাঁচ 5
ছয় 6
সাত 7
আট 8
নয় 9
Bengali Speech Recognition
52
Bangla Pnoneme (যুক্তবর্ণ) IPA
KK
KT
KT
KTR
KW
KM
KY
KR
KL
KKH
KKHW
KKHN
KKHM
KKHY
KS
KHY
KHR
GN
GDH
NGM
CC
CCH
CCHW
CCHR
CNG
Bengali Speech Recognition
53
CY
GN
GNY
GW
GM
GY
GR
GL
KR
KL
KKH
KKHW
KKHN
KKHM
KKHY
KS
KHY
KHR
GN
GDH
NGM
CC
CCH
CCHW
CCHR
Bengali Speech Recognition
54
CNG
CY
JJ
JJW
JJH
GG
JW
JY
JR
NC
NCH
NJ
NJH
TT
TT
TTW
TTH
TN
TW
TM
TMY
TY
TR
THW
THY
THR
Bengali Speech Recognition
55
DG
DGH
DD
DDW
DDH
DW
DV
DM
DY
DR
NM
NY
NS
PT
PT
PN
PP
PY
PR
PL
PS
FR
FL
BJ
BD
BDH
Bengali Speech Recognition
56
BB
BY
BR
BL
LT
LD
LDH
LP
LB
LV
LM
LY
LL
SHC
SHCH
SHT
SHN
SHW
SHM
SHY
SHR
SHL
SHK
SHKR
SHT
SF
Bengali Speech Recognition
57
SW
SM
SY
SR
SL
SKL
HN
HN
HW
HM
HY
HR
HL
HRRI
GHN
GHY
GHR
NK
NKY
NGKKH
NGKH
NGG
NGGY
NGGH
NGGHY
NGGHR
Bengali Speech Recognition
58
TW
TM
TY
TR
DD
DY
DR
DHY
DHR
NT
NTH
ND
NDY
NDR
NDH
NN
NW
NM
NY
DHN
DHW
DHM
DHY
DHR
NT
NTH
Bengali Speech Recognition
59
ND
NT
NTW
NTY
NTR
NTH
ND
NDY
NDW
NDR
NDH
NDHY
NDHR
NN
NW
VY
VR
VL
MTH
MN
MP
MPR
MF
MB
MV
MVR
Bengali Speech Recognition
60
MM
MY
MR
ML
ZY
, , RRK
, RRKY
LK
LG
SHTY
SHTR
SHTH
SHTHY
SHN
SHP
SHPR
SPHY
SHW
SHM
SK
SKR
ST
STR
SKH
ST
STW
Bengali Speech Recognition
61
STY
STH
STHY
SN
SP
Corpus
Our total Sentences is 185 but we have recognized 50 sentences for short time duration.
No Sentence
Fig 7.2: Unicode to IPA Chart
Bengali Speech Recognition
62
1 শুভ সকাল
2 ধন্যবাদ
3 আমি আপনাকে কি সাহায্য করতে পারি
4 আমি কিছু তথ্য জানতে এসেছি
5 কি বলুন
6 ভর্তি বিষয়ে
7 এএএএ কোন বিভাগে ভর্তি হতে ইচ্ছুক
8 কম্পিউটার বিজ্ঞান এ এএএএএএএএ বিভাগে
9 এ বিভাগে ভর্তি চলছে
10 এ বিভাগে কি কি সুবিধা আছে
11 বিভিন্ন ধরনের ল্যাব সুবিধা আছে
12 যেমন
13 দুএএ কম্পিউটার ল্যাব আছে
14 ও আচ্ছা
15 একটি শুধু বিজ্ঞান বিভাগের জন্য
16 আর আরেকটি
17 সব এএএএএএএ জন্য
18 প্রতি এএএএএএ কতগুলো কম্পিউটার আছে
19 বত্রিশটি করে
20 আর কিছু
21 কতজন শিক্ষক আছেন এ বিভাগে
22 প্রায় এএএএ জন
23 মোট কতজন ছাত্রছাত্রী এ এএএএএএ
24 প্রায় এএএএএ জন
25 এ বিভাগেএ এএএএএএএএ এএএ এএ
26 রুমেল এম এস রাহমান পীর
27 বিশ্ব বিদ্যালয়ের প্রতিষ্ঠাতা কে জানতে পারি
28 অবশ্যই
29 দানবীর মিস্টার রাগিব আলী
30 আপনাদের কি আর কোন শাখা আছে
31 না সিলেট এই একমাত্র ক্যাম্পাস
32 এএএএএএএএএএএএএ কবে স্থাপিত হয়েছে
33 এএএ এএএএএ এএ সালে
34 আর কি কি সুবিধা আছে
35 হার্ডওয়্যার সার্কিট ও রসায়নের ল্যাব আছে
36 লাইব্রেরী কি আছে
37 অবশ্যই একটা বড় লাইব্রেরী আছে
38 ক্যান্টিন আছে
39 খুবই উন্নতমানের একটি ক্যান্টিনও আছে
Bengali Speech Recognition
63
40 ভর্তির শেষ তারিখ কবে
41 এ মাসের পাঁচ তারিখ
42 ক্লাস শুরু হবে এ মাসের দশ তারিখ হতে
43 কোন কোন তলা নিয়ে বিশ্ব বিদ্যালয় ক্যাম্পাস
44 তিন চার এএএ পাঁচ এএএ নিয়ে
45 আপনাদের বএরে কত সেমিস্টার
46 তিন সেমিস্টার
47 তাহলে তো মোট এএএ সেমিস্টার
48 জি হ্যা
49 এ বিভাগে মোট কত ক্রেডিট পড়ানো হয়
50 এএএএ এএএএএএএ ক্রেডিট
51 ইউ
52
53 কত
54
55 কত
56 উপর
57
58 এক
59
60 আর
61 কত
62 এক
63
64
65 আর
67 ও
68
69 কত
70 এক
71 কত
72
73 রকম
74
75 আর
76 ও
Bengali Speech Recognition
64
77 সব একশত
78
79 উপর
80
81 আর কত
82 এ আর এ
83 এ
84
85 এক
86
87 আর সবসময়
88
89 ও এ
90
91 এ আর এ
92 এ আর এ
93
94 আরকম
95 আর এ
96
97 আর
98
99
100
101 আর
102
103
104
105
106
107 আরও
108
109 সব
Bengali Speech Recognition
65
110
111
112
113 ও সব
114
115 হয়
116
117
118 আর
119 -ই
হয়
120
121 হয়
122
123 ও হয়
124
125 -
126 হয়
127
128
129 এ
হয়
130 সব
131
132
133
134
135
136 ওহ
137
হয়
138 হয়
139 বছর হয়
140
141
142
143
144 হয়
Bengali Speech Recognition
66
145 এখনও
146
147
148
149 ,
150
151
152
153 একর
154
155
156
157
158
159 ভবন
160
161
162
163
164
165 একবৎসর
166
167
168
169 সময়
170
171 এখন
172 পর
173
174 পর
175
176 পর
177 এক পর
Bengali Speech Recognition
67
178
179 ও
180
181
182
183
184
185
Fig 7.3: Corpus about University Admission Information.
Bengali Speech Recognition
68
CODE OF OUR PROJECT:
package sbs.BSR.training.files.creator;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
public class FileidsCreator {
static List<String> dirTreeLevel1= new ArrayList<String>();
static List<String> dirTreeLevel2= new ArrayList<String>();
static List<String> dirTreeLevel3= new ArrayList<String>();
static List<String> tempList = new ArrayList<String>();
public static void main(String[] args) {
SortArrayList sortObj = new SortArrayList();
int lineCounts = 0;
String dirTreeRootName = "C:/trainnign_file/sbs_asr_train/";
String root = getRootFromPath(dirTreeRootName);
listdir(dirTreeRootName,1);
Collections.sort(dirTreeLevel1);
int sizeOfdirTreeLevel1 = dirTreeLevel1.size();
int i = 0,j=0,k=0;
while(sizeOfdirTreeLevel1>i)
{
String path2 = dirTreeRootName+dirTreeLevel1.get(i);
dirTreeLevel2.clear();
listdir(path2,2);
dirTreeLevel2 = sortObj.sortList(dirTreeLevel2);
int sizeOfdirTreeLevel2 = dirTreeLevel2.size();
String path3="";
while(sizeOfdirTreeLevel2>j)
{
path3 = path2+"/"+dirTreeLevel2.get(j)+"/";
listdir(path3,3);
while(dirTreeLevel3.size()>k)
{
String targetPath =
root+"/"+dirTreeLevel1.get(i)+"/"+dirTreeLevel2.get(j)+"/"+dirTreeLevel3.get(k);
Bengali Speech Recognition
69
targetPath = targetPath.replaceAll(".wav", "");
writIntoFile(targetPath,"C:/trainnign_file/sbs_asr_train/"+"sbs_asr_train.fileids");
lineCounts++;
k++;
}
j++;
}
//file listing end
j=0;
i++;
}
transcriptCreator tcObj = new transcriptCreator();
try {
tcObj.readCorpus("C:/trainnign_file/corpus.txt");
tcObj.CreateTransCriptFile(dirTreeLevel3,
"C:/trainnign_file/sbs_asr_train/sbs_asr_train.transcript");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int si=0;
}
public static void listdir(String path,int Level)
{
File folder = new File(path);
File[] listOfFiles = folder.listFiles();
int numofL_I = 0;
int numOfL = listOfFiles.length;
while(numOfL>numofL_I)
{
if (listOfFiles[numofL_I].isDirectory())
{
if(Level == 1)
{
dirTreeLevel1.add(listOfFiles[numofL_I].getName());
}
else if(Level == 2)
{
dirTreeLevel2.add(listOfFiles[numofL_I].getName());
}
}
else
{
Bengali Speech Recognition
70
if (listOfFiles[numofL_I].isFile())
{
dirTreeLevel3.add(listOfFiles[numofL_I].getName());
}
}
numofL_I++;
}
}
public static String getRootFromPath(String UserDir)
{
String root = null;
int count = 0;
int[] indexes = new int[2];
int i = 0;
i = indexes[1] = UserDir.lastIndexOf('/');
i=i-1;
while(i>0)
{
if(UserDir.charAt(i)=='/')
{
indexes[0] = i;
break;
}
i--;
}
root = UserDir.substring(indexes[0]+1, indexes[1]);
return root;
}
public static void writIntoFile(String data,String path)
{
try{
FileWriter fstream = new FileWriter(path,true);
BufferedWriter out = new BufferedWriter(fstream);
out.write(data+'n');
out.close();
}catch(IOException e){
e.printStackTrace();
}
}
}
……………………………………………………………………………………………
Bengali Speech Recognition
71
ARRAY:
……………………………………………………………………………………………
package sbs.BSR.training.files.creator;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class SortArrayList{
public List<String> sortList(List<String> unsortList){
List<String> mysortList = new ArrayList<String>();
int i = 0;
while(unsortList.size()>i)
{
String str = unsortList.get(i);
str = str.replaceAll("[^d.]", "");
mysortList.add(str);
i++;
}
int[] sortint = new int[mysortList.size()];
i = 0;
while(unsortList.size()>i)
{
sortint[i] = Integer.valueOf(mysortList.get(i));
i++;
}
Arrays.sort(sortint);
String folNameWONum = unsortList.get(0).replaceAll("[^a-z ^A-Z]","");
mysortList.clear();
i = 0;
while(unsortList.size()>i)
{
String requiredString =
folNameWONum+String.valueOf(sortint[i]);
mysortList.add(requiredString);
i++;
}
return mysortList;
}
Bengali Speech Recognition
72
}
………………………………………………………………………………………………
TRANSCRIPT CREATOR
package sbs.BSR.training.files.creator;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
public class transcriptCreator {
static ArrayList<Object> inputLines=new ArrayList<Object>();
public void readCorpus(String path) throws FileNotFoundException {
FileInputStream fstream = new FileInputStream(path);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
try{
while ((strLine = br.readLine()) != null) {
inputLines.add(strLine);
}
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
int inputLineSize = inputLines.size();
int i = 0;
while(inputLineSize>i)
{
i++;
}
}
public static void CreateTransCriptFile(List<String> data,String path)
{
try{
Bengali Speech Recognition
73
FileWriter fstream = new FileWriter(path,true);
int numOfwriting = data.size();
int i = 0;
int lineStart = 0;
int lineEnd = inputLines.size();
BufferedWriter out = new BufferedWriter(fstream);
while(numOfwriting>i)
{
if(lineStart == lineEnd) lineStart = 0;
String wavRemove = data.get(i).toString();
wavRemove = wavRemove.replaceAll(".wav", "");
String leadTrailspaceRemoved =
inputLines.get(lineStart).toString();
leadTrailspaceRemoved = leadTrailspaceRemoved.trim();
String pattern = "<S> "+leadTrailspaceRemoved+" </S>
("+wavRemove+")";
out.write(pattern+'n');
lineStart++;
i++;
}
//Close the output stream
out.close();
}catch(IOException e){
e.printStackTrace();
}
}
}
…………………………………………………………………………………………….
FILE OPERATOR
package ptpack;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.InputStreamReader;
import java.util.ArrayList;
public class FileOperator {
@SuppressWarnings("null")
Bengali Speech Recognition
74
public ArrayList<Object> getStrings()
{
ArrayList<Object> allInputStrings=new ArrayList<Object>();
int aisI = 0;
try {
FileInputStream fstream = new
FileInputStream("C:/trainnign_file/testInput.dic");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String str;
while ((str = br.readLine()) != null) {
str = str.trim();
allInputStrings.add(str);
System.out.println(str.trim()+" "+str.length());
}
in.close();
} catch (Exception e) {
System.err.println(e);
}
return allInputStrings;
}
public void createFile(String finalData)
{
try {
BufferedWriter out = new BufferedWriter(new
FileWriter("C:/trainnign_file/sbs_asr_train4.dic"));
out.write(finalData);
out.close();
} catch (Exception e) {
System.err.println(e);
}
}
}
…………………………………………………………………………………………..
PHONETIC TRANSLATION
package ptpack;
import java.util.ArrayList;
public class PhoneticTranslation {
Bengali Speech Recognition
75
public static void main(String[] args) {
PronounciationGenarator pgObj = new PronounciationGenarator();
FileOperator foObj = new FileOperator();
ArrayList<Object> inputStrings = new ArrayList<Object>();
inputStrings = foObj.getStrings();
String pro = "";
System.out.println("in phonetic translation");
int i = 0;
String is = "";
String fileImage = "";
while(inputStrings.size()>i)
{
is = inputStrings.get(i).toString().trim();
pro = pgObj.getPronouciation(is);
pro = pro.trim();
fileImage = fileImage+is+" "+pro+"n";
i++;
}
//System.out.println(fileImage);
foObj.createFile(fileImage);
}//main end
}
Prodb
package ptpack;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class Prodb {
private static final String DBURL =
"jdbc:mysql://localhost:3306/bsr?user=root&password=" +
"&useUnicode=true&characterEncoding=UTF-8";
private static final String DBDRIVER = "com.mysql.jdbc.Driver";
static {
try {
Class.forName(DBDRIVER).newInstance();
} catch (Exception e){
Bengali Speech Recognition
76
e.printStackTrace();
}
}
private static Connection getConnection()
{
Connection connection = null;
try {
connection = DriverManager.getConnection(DBURL);
}
catch (Exception e) {
e.printStackTrace();
}
return connection;
}
public static void showEmployee() {
Connection con = getConnection();
Statement stmt =null;
try {
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("Select * from employees "
+ "where EmployeeID=1001");
if (rs.next()) {
System.out.println("EmployeeID : " +
rs.getInt("EmployeeID"));
System.out.println("Name : " + rs.getString("Name"));
System.out.println("Office : " + rs.getString("Office"));
}
else {
System.out.println("No Specified Record.");
}
rs.close();
} catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
}
finally {
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
System.err.println("SQLException: " + e.getMessage());
}
}
if (con != null) {
try {
Bengali Speech Recognition
77
con.close();
} catch (SQLException e) {
System.err.println("SQLException: " + e.getMessage());
}
}
}
}
public static boolean isBanjonBorno(char ch) {
Connection con = getConnection();
Statement stmt =null;
int id=0;
boolean isBBorno = false;
try {
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("Select id from banglatab "
+ "where letter= '"+ch+"'");
if (rs.next()) {
id = rs.getInt("id");
if(id>=13 && id<=48)
{
isBBorno = true;
}
}
else {
System.out.println("No Specified Record.");
}
rs.close();
} catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
}
finally {
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
System.err.println("SQLException: " + e.getMessage());
}
}
if (con != null) {
try {
con.close();
} catch (SQLException e) {
System.err.println("SQLException: " + e.getMessage());
}
Bengali Speech Recognition
78
}
}
return isBBorno;
}
}//class end
……………………………………………………………………………………………
PRONOUNCIATION GENARATOR
package ptpack;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class PronounciationGenarator {
Prodb pdobj = new Prodb();
String BanglaWord = "";
int BanglaWordLength = 0;
int ConjunctsPosition[] = new int[20];
int NoConjunctsPosition[] = new int[20];
int cpi=0,ncpi=0,k=0;
char ConjuctsIdentifyCharacter = ' ';
int calltimes = 1;
public String getPronouciation(String bw)
{
BanglaWord = bw;
BanglaWordLength = BanglaWord.length();
while(k<BanglaWordLength)
{
k++;
}
k=0;
while(BanglaWordLength>k)
{
if(BanglaWord.charAt(k)==ConjuctsIdentifyCharacter)
{
ConjunctsPosition[cpi] = k-1;
cpi++;
Bengali Speech Recognition
79
ConjunctsPosition[cpi] = k;
cpi++;
ConjunctsPosition[cpi] = k+1;
cpi++;
}
k++;
}
int NoOfConjuctsPosition = cpi;
k = 0;
//System.out.println("nConjunctsPosition");
while(cpi>k)
{
//System.out.print(ConjunctsPosition[k]+" ");
k++;
}
//int wc[] = {0,1,2,4,5,6};
int i=0,trace=0;
cpi = 0;
ncpi = 0;
while(BanglaWordLength>i)
{
while(NoOfConjuctsPosition>cpi)
{
if(ConjunctsPosition[cpi]==i)
{
trace = 1;
break;
}
cpi++;
}
if (trace==0)
{
NoConjunctsPosition[ncpi]=i;
ncpi++;
}
cpi=0;
i++;
trace = 0;
}
i=0;
while(ncpi>i)
{
i++;
}
// matching serially and making conjuct
Bengali Speech Recognition
80
i = 0;
cpi = 0;
ncpi = 0;
int ai = 0;
String SearchStrings[] = new String[100];
int ssi = 0;
int serialityTrace =0;
String tempString = "";
boolean tempc,tempc2;
while(BanglaWordLength>i)
{
while(NoOfConjuctsPosition>cpi)
{
if(ConjunctsPosition[cpi]==i)
{
trace = 1;
break;
}
cpi++;
}
if (trace==0)// if nonconjuct
{
SearchStrings[ssi] = Character.toString(BanglaWord.charAt(i));
ssi++;
if(BanglaWordLength!=(i+1))
{
calltimes++;
tempc = pdobj.isBanjonBorno(BanglaWord.charAt(i));
tempc2 = pdobj.isBanjonBorno(BanglaWord.charAt(i+1));
if(tempc==true && tempc2==true)
{
SearchStrings[ssi] = Character.toString('অ');
ssi++;
}
}
}
else // if conjuct
{
tempString = Character.toString(BanglaWord.charAt(i));
if(BanglaWord.charAt(i)=='র')
{
SearchStrings[ssi] =
Character.toString(BanglaWord.charAt(i));
ssi++;
i+=2;
Bengali Speech Recognition
81
SearchStrings[ssi] =
Character.toString(BanglaWord.charAt(i));
ssi++;
}
else
{
while(NoOfConjuctsPosition>serialityTrace)
{
if(ConjunctsPosition[serialityTrace]==i)
{
break;
}
serialityTrace++;
}
int diffbi = Math.abs(ConjunctsPosition[serialityTrace]-
ConjunctsPosition[serialityTrace+1]);
System.out.println("BanglaWord = "+BanglaWord+"
"+BanglaWord.length());
while(diffbi<=1)
{
System.out.println("diffbi = "+diffbi+" "+" serialityTrace
"+serialityTrace);
i++;
System.out.println("i = "+i+" BanglaWord.charAt(i)
"+BanglaWord.charAt(i));
tempString += Character.toString(BanglaWord.charAt(i));
serialityTrace++;
diffbi = Math.abs(ConjunctsPosition[serialityTrace]-
ConjunctsPosition[serialityTrace+1]);
}
SearchStrings[ssi] = tempString;
ssi++;
}
}//conjuct adding end
cpi=0;
i++;
trace = 0;
}
i=0;
while(ssi>i)
{
i++;
}
String phoneticTrans = "";
Connection conn = null;
Bengali Speech Recognition
82
Statement stmt = null;
ResultSet rs = null;
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
String connectionUrl =
"jdbc:mysql://localhost:3306/bsr?useUnicode=yes&characterEncoding=UTF-8";
String connectionUser = "root";
String connectionPassword = "";
conn = DriverManager.getConnection(connectionUrl, connectionUser,
connectionPassword);
stmt = conn.createStatement();
i=0;
while (ssi>i) {
rs = stmt.executeQuery("SELECT pro FROM banglatab where
letter = '"+SearchStrings[i]+"'");
rs.next();
String pro = rs.getString("pro");
phoneticTrans = phoneticTrans+pro+" ";
i++;
}
rs.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
try { if (rs != null) rs.close(); } catch (SQLException e) {
e.printStackTrace(); }
try { if (stmt != null) stmt.close(); } catch (SQLException e) {
e.printStackTrace(); }
try { if (conn != null) conn.close(); } catch (SQLException e) {
e.printStackTrace(); }
}
return phoneticTrans;
}
}
……………………………………………………………………………………………
SBSASR_MAIN
package SBS.BSR.S50;
import org.omg.CORBA.portable.InputStream;
import org.omg.CORBA.portable.OutputStream;
Bengali Speech Recognition
83
import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
public class SBSBSR {
public static void main(String[] args) {
ConfigurationManager cm;
if (args.length > 0) {
cm = new ConfigurationManager(args[0]);
} else {
cm = new
ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml"));
}
// allocate the recognizer
System.out.println("Loading...");
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
recognizer.allocate();
// start the microphone or exit the program if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
System.exit(1);
}
printInstructions();
//giveCommand();
// loop the recognition until the programm exits.
String comString = " ";
System.out.println("comString: " + comString + " length :
"+comString.length()+'n');
while (true) {
System.out.println("Start speaking. Press Ctrl-C to quit.n");
Result result = recognizer.recognize();
if (result != null) {
String resultText = result.getBestResultNoFiller();
System.out.println("You said: " + resultText +'n');
} else {
Bengali Speech Recognition
84
System.out.println("I can't hear what you said.n");
}
}
}
/** Prints out what to say for this demo. */
private static void printInstructions() {
System.out.println("Sample sentences:n" +
" n" +
" n" +
" n" +
" nn");
}
}
Sbsbsr Transcriber
package SBS.BSR.S50;
import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import javax.sound.sampled.UnsupportedAudioFileException;
import java.awt.AWTException;
import java.io.File;
import java.io.IOException;
import java.net.URL;
public class Transcriber {
public static void main(String[] args) throws IOException,
UnsupportedAudioFileException, AWTException {
URL audioURL;
if (args.length > 0) {
audioURL = new File(args[0]).toURI().toURL();
} else {
audioURL = Transcriber.class.getResource("sanjoy_falguni_dola_s20.wav");
Bengali Speech Recognition
85
}
URL configURL = Transcriber.class.getResource("sbsbsr_transcriber_config.xml");
ConfigurationManager cm = new ConfigurationManager(configURL);
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
/* allocate the resource necessary for the recognizer */
recognizer.allocate();
// configure the audio input for the recognizer
AudioFileDataSource dataSource = (AudioFileDataSource)
cm.lookup("audioFileDataSource");
dataSource.setAudioFile(audioURL, null);
// Loop until last utterance in the audio file has been decoded, in which case the
recognizer will return null.
Result result;
while ((result = recognizer.recognize())!= null) {
String resultText = result.getBestResultNoFiller();
System.out.println(resultText);
}
}
}
Desktop Command Application
package SBS.BSR.CMD.APP.S12;
import java.awt.AWTException;
import java.awt.Robot;
import java.awt.event.InputEvent;
import java.awt.event.KeyEvent;
import java.io.IOException;
public class CommandActivator {
public void leftClick() throws AWTException{
Bengali Speech Recognition
86
Robot robot = new Robot();
robot.mousePress(InputEvent.BUTTON1_MASK);
robot.mouseRelease(InputEvent.BUTTON1_MASK);
}
/*
public void rightClick() throws AWTException{
Robot robot = new Robot();
robot.mousePress(InputEvent.BUTTON3_MASK);
robot.mouseRelease(InputEvent.BUTTON3_MASK);
}*/
public void doubleClick() throws AWTException{
Robot robot = new Robot();
robot.mousePress(InputEvent.BUTTON1_MASK);
robot.mouseRelease(InputEvent.BUTTON1_MASK);
robot.mousePress(InputEvent.BUTTON1_MASK);
robot.mouseRelease(InputEvent.BUTTON1_MASK);
}
public void copy() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_C);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_C);
}
public void paste() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_V);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_V);
}
public void delete() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_DELETE);
robot.keyRelease(KeyEvent.VK_DELETE);
}
public void selectAll() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_A);
robot.keyRelease(KeyEvent.VK_CONTROL);
Bengali Speech Recognition
87
robot.keyRelease(KeyEvent.VK_A);
}
public void up() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_PAGE_UP);
robot.keyRelease(KeyEvent.VK_PAGE_UP);
}
public void down() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_PAGE_DOWN);
robot.keyRelease(KeyEvent.VK_PAGE_DOWN);
}
public void previousPage() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_PAGE_UP);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_PAGE_UP);
}
public void nextPage() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_PAGE_DOWN);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_PAGE_DOWN);
}
public void openNewFile() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_N);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_N);
}
public void openHere() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_O);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_O);
}
Bengali Speech Recognition
88
public void close() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_ALT);
robot.keyPress(KeyEvent.VK_F4);
robot.keyRelease(KeyEvent.VK_ALT);
robot.keyRelease(KeyEvent.VK_F4);
}
public void startMenu() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_WINDOWS);
robot.keyRelease(KeyEvent.VK_WINDOWS);
}
public void refresh() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_F5);
robot.keyRelease(KeyEvent.VK_F5);
}
public void help() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_F1);
robot.keyRelease(KeyEvent.VK_F1);
}
public void showDesktop() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_WINDOWS);
robot.keyPress(KeyEvent.VK_D);
robot.keyRelease(KeyEvent.VK_WINDOWS);
robot.keyRelease(KeyEvent.VK_D);
}
public void openMyComputer() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_WINDOWS);
robot.keyPress(KeyEvent.VK_E);
robot.keyRelease(KeyEvent.VK_WINDOWS);
robot.keyRelease(KeyEvent.VK_E);
}
// sokrio
public void enter() throws AWTException{
Robot robot = new Robot();
Bengali Speech Recognition
89
robot.keyPress(KeyEvent.VK_ENTER);
robot.keyRelease(KeyEvent.VK_ENTER);
}
// porer window | ager window
public void altTab() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_ALT);
robot.keyPress(KeyEvent.VK_TAB);
robot.keyRelease(KeyEvent.VK_ALT);
robot.keyRelease(KeyEvent.VK_TAB);
}
// porer tab | ager tab
public void ctlTab() throws AWTException{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_TAB);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_TAB);
}
public void openNotepad() throws AWTException, IOException{
ProcessBuilder proc=new ProcessBuilder("notepad.exe");
Process p=proc.start();
}
public void openBrowser() throws AWTException, IOException{
String theUrl = "http://www.google.com";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
public void openFacebook() throws AWTException, IOException{
String theUrl = "http://www.facebook.com";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
public void openYahoo() throws AWTException, IOException{
String theUrl = "http://www.yahoo.com";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
Bengali Speech Recognition
90
public void openTechtunes() throws AWTException, IOException{
String theUrl = "http://www.techtunes.com.bd";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
public void openProthomAlo() throws AWTException, IOException{
String theUrl = "http://www.prothom-alo.com";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
}
SBSBSR.JAVA
package SBS.BSR.CMD.APP.S12;
import java.awt.AWTException;
import java.awt.Robot;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import org.omg.CORBA.portable.InputStream;
import org.omg.CORBA.portable.OutputStream;
import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
public class SBSBSR {
public static void main(String[] args) throws IOException, AWTException {
ConfigurationManager cm;
if (args.length > 0) {
cm = new ConfigurationManager(args[0]);
} else {
cm = new
ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml"));
Bengali Speech Recognition
91
}
// allocate the recognizer
System.out.println("Loading...");
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
recognizer.allocate();
// start the microphone or exit the program if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
System.exit(1);
}
/*Robot robot = new Robot();
robot.delay(2000);
giveCommand("bangla command");
robot.delay(3000);*/
printInstructions();
//giveCommand();
// loop the recognition until the programm exits.
String comString = " ";
System.out.println("comString: " + comString + " length :
"+comString.length()+'n');
while (true) {
System.out.println("Start speaking. Press Ctrl-C to quit.n");
Result result = recognizer.recognize();
if (result != null) {
String resultText = result.getBestResultNoFiller();
System.out.println("You said: " + resultText +'n');
giveCommand(resultText);
//CommandActivator obj = new CommandActivator();
//obj.openMyComputer();
} else {
System.out.println("I can't hear what you said.n");
}
}
}
/** Prints out what to say for this demo. */
private static void printInstructions() {
System.out.println("Sample sentences:n" +
" n" );
Bengali Speech Recognition
92
}
private static void giveCommand(String CompareText) throws AWTException,
IOException{
if(CompareText.equals(" ")){
CommandActivator obj = new CommandActivator();
obj.rightClick();
}
}
}
------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------

Mais conteúdo relacionado

Destaque

Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
Sarang Afle
 
Janet Dore
Janet DoreJanet Dore
Janet Dore
DetranRS
 

Destaque (20)

Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Self defense
Self defenseSelf defense
Self defense
 
Easy Way of Learning the Holy AL-Quran
Easy Way of Learning the Holy AL-QuranEasy Way of Learning the Holy AL-Quran
Easy Way of Learning the Holy AL-Quran
 
Full series of Prayer education
Full series of Prayer education Full series of Prayer education
Full series of Prayer education
 
The technology & art on Photography
The technology & art on PhotographyThe technology & art on Photography
The technology & art on Photography
 
Rules of Quran recitation
Rules of Quran recitationRules of Quran recitation
Rules of Quran recitation
 
My Masters Thesis Paper
My Masters Thesis PaperMy Masters Thesis Paper
My Masters Thesis Paper
 
Learning the Holy AL Quran in scientific method
Learning the Holy AL Quran in scientific methodLearning the Holy AL Quran in scientific method
Learning the Holy AL Quran in scientific method
 
ShortCut Math
ShortCut MathShortCut Math
ShortCut Math
 
Thinking and Writing from Word to Sentence to Paragraph to essay
Thinking and Writing from Word to Sentence to Paragraph to essayThinking and Writing from Word to Sentence to Paragraph to essay
Thinking and Writing from Word to Sentence to Paragraph to essay
 
Bangla academy Bengali to Bengali dictionary
Bangla academy Bengali to Bengali dictionaryBangla academy Bengali to Bengali dictionary
Bangla academy Bengali to Bengali dictionary
 
Photography course
Photography coursePhotography course
Photography course
 
English to Bengali dictionary
English to Bengali dictionaryEnglish to Bengali dictionary
English to Bengali dictionary
 
Short Disclosure Language Bengali Grammar
Short Disclosure Language Bengali GrammarShort Disclosure Language Bengali Grammar
Short Disclosure Language Bengali Grammar
 
Bengali Literature's Shortcut Technique
Bengali Literature's Shortcut TechniqueBengali Literature's Shortcut Technique
Bengali Literature's Shortcut Technique
 
Rethinking Welcome (Visitor Information) Centers - New Guidelines
Rethinking Welcome (Visitor Information) Centers - New GuidelinesRethinking Welcome (Visitor Information) Centers - New Guidelines
Rethinking Welcome (Visitor Information) Centers - New Guidelines
 
Uusien valtuutettujen koulutus
Uusien valtuutettujen koulutusUusien valtuutettujen koulutus
Uusien valtuutettujen koulutus
 
Adnan Oktar (Harun Yahya's) views on the people of the book 1. english
Adnan Oktar (Harun Yahya's) views on the people of the book 1. englishAdnan Oktar (Harun Yahya's) views on the people of the book 1. english
Adnan Oktar (Harun Yahya's) views on the people of the book 1. english
 
Sertão vazio - Tião Carreiro
Sertão vazio - Tião Carreiro Sertão vazio - Tião Carreiro
Sertão vazio - Tião Carreiro
 
Janet Dore
Janet DoreJanet Dore
Janet Dore
 

Semelhante a Thesis Paper of my Bachelor Degree

An Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General PeopleAn Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General People
Afif Bin Kamrul
 
Jayashan-cb004082-Criminal Face Recognition-Final
Jayashan-cb004082-Criminal Face Recognition-FinalJayashan-cb004082-Criminal Face Recognition-Final
Jayashan-cb004082-Criminal Face Recognition-Final
Jayashan Fernando
 
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
Chimwani George
 

Semelhante a Thesis Paper of my Bachelor Degree (20)

Study space(report)
Study space(report)Study space(report)
Study space(report)
 
Thesis
ThesisThesis
Thesis
 
Facial recognition attendance system
Facial recognition attendance systemFacial recognition attendance system
Facial recognition attendance system
 
Face recognition attendance system
Face recognition attendance systemFace recognition attendance system
Face recognition attendance system
 
A report on combine system of air cooler and water chiller by Sanjay Neolia.
A report on combine system of air cooler and water chiller by Sanjay Neolia.A report on combine system of air cooler and water chiller by Sanjay Neolia.
A report on combine system of air cooler and water chiller by Sanjay Neolia.
 
Social Networking Site Documentation
Social Networking Site Documentation Social Networking Site Documentation
Social Networking Site Documentation
 
An Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General PeopleAn Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General People
 
1227201 Report
1227201 Report1227201 Report
1227201 Report
 
ANSYS Fluent - CFD Final year thesis
ANSYS Fluent - CFD Final year thesisANSYS Fluent - CFD Final year thesis
ANSYS Fluent - CFD Final year thesis
 
Industrial Training Report, UmaOya Downstream Development Project
Industrial Training Report, UmaOya Downstream Development ProjectIndustrial Training Report, UmaOya Downstream Development Project
Industrial Training Report, UmaOya Downstream Development Project
 
Continuously variable transmission report
Continuously variable transmission reportContinuously variable transmission report
Continuously variable transmission report
 
Jayashan-cb004082-Criminal Face Recognition-Final
Jayashan-cb004082-Criminal Face Recognition-FinalJayashan-cb004082-Criminal Face Recognition-Final
Jayashan-cb004082-Criminal Face Recognition-Final
 
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
 
Test
TestTest
Test
 
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
IMPROVING FINANCIAL AWARENESS AMONG THE POOR IN KOOJE SLUMS OF MERU TOWN-FINA...
 
Project Report
 Project Report Project Report
Project Report
 
P[1].hd book director 2_1
P[1].hd book director 2_1P[1].hd book director 2_1
P[1].hd book director 2_1
 
Artificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLABArtificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLAB
 
Internship report-csit-isp_networking
 Internship report-csit-isp_networking Internship report-csit-isp_networking
Internship report-csit-isp_networking
 
Final year project
Final year projectFinal year project
Final year project
 

Thesis Paper of my Bachelor Degree

  • 1. BENGALI SPEECH RECOGNITION DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING LEADING UNIVERSITY, SYLHET 1st January 2013
  • 2. Bengali Speech Recognition 2 BENGALI SPEECH RECOGNITION 1st JANUARY, 2013 This Project report is submitted to the Department of Computer Science and Engineering, Leading University, for the partial fulfillment for the requirements of the degree of Bachelor of Science in Computer Science and Engineering. Supervised By Mrs. Arpita Chakraborty Assistant Professor Department of Computer Science and Engineering Leading University, Sylhet & Mrinal Kanti Dhar Lecturer Department of Electrical & Electronic Engineering Leading University, Sylhet Conducted By DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING LEADING UNIVERSITY, SYLHET, BANGLADESH Shimul Dey B.Sc (Hon’s) Final Semester Examination-2013 ID: 0901020032 Session: 2009-2013 Sanjoy Ranjan Das B.Sc (Hon’s) Final Semester Examination-2013 ID: 0901020016 Session: 2009-2013 Md. Badrul Alom Chowdhury B.Sc (Hon’s) Final Semester Examination-2013 ID: 0901020004 Session: 2009-2013
  • 3. Bengali Speech Recognition 3 To The Head Department of Computer Science and Engineering Leading University, Sylhet, Bangladesh. Sub: Proposal for Project. Respected Sir, We would like to inform you that, we are the student of your department would like to carryout a project on “BENGALI SPEECH RECOGNITION”. We would be grateful to you if you kindly allow us to proceed to complete the project on the above mention topics under condition of partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering. Thanking you. Yours Sincerely Name ID Md. Badrul Alom Chowdhury 0901020004 Sanjoy Ranjan Das 0901020016 Shimul Dey 0901020032
  • 4. Bengali Speech Recognition 4 DECLARATION We hereby declare that the project work entitled “Bengali Speech Recognition” submitted to the Leading University, is a record of an original work done by us under the guidance of Arpita Chakraborty, Assistant professor in Department of Computer Science and Engineering, Leading University and this project work is submitted in the fulfillment of Bachelor in Computer Science & Engineering. The result of this project has not been submitted to any other University or Institute for the award of any degree or diploma. Materials of work found by other researcher are mentioned by reference. Signature of Spervisor & Co-supervisor Name of Supervisor Signature Mrs. Arpita Chakraborty Assistant Professor Name of Co-supervisor Signature Mrinal Kanti Dhar Lecturer Signature of Authors Name of Authors Signature Md. Badrul Alom Chowdhury Sanjoy Ranjan Das Shimul Dey
  • 5. Bengali Speech Recognition 5 ACKNOWLEDGEMENT We would like to thank our Honorable Supervisor Arpita Chakraborty & Co-supervisor Mrinal Kanti Dhar for their guidance throughout the process. They exposed us to the real professional research world with their precious experience. We really cherish for the time working with them on such an interesting topic. Also we would like to thank our university students to let us record their voice for experiments and our Computer Science & Engineering Department for giving us authority and facility to complete the project. Last but not at least, thanks to the Almighty for helping us in every steps of this project work. .
  • 6. Bengali Speech Recognition 6 Table of Contents Declaration...................................................................................................................................4 Acknowledgments........................................................................................................................5 List of figures...............................................................................................................................8 List of Chart.................................................................................................................................9 List of Table...............................................................................................................................10 List of Abbreviation & Symbols................................................................................................10 Abstract......................................................................................................................................11 Literature Survey .......................................................................................................................12 Chapter 1: Introduction ............................................................................(13-21) 1.1 Introduction ..........................................................................................................14 1.2 History of Speech Recognition ..................................................................... 14-15 1.3 Types of Speech Recognition ...........................................................................15 1.3.1 Isolated Words ......................................................................................15 1.3.2 Connected Words ...............................................................................16 1.3.3 Continuous Words ................................................................................16 1.3.4 Spontaneous Words .............................................................................16 1.3.5 Speaker Dependent ...............................................................................16 1.3.6 Speaker independent .............................................................................16 1.3.7 Overview of Speech Recognition System .............................................17 1.4 Terms and Concepts..........................................................................................17 1.4.1 Utterance ................................................................................................17 1.4.2 Pronunciation ................................................................................... 17-18 1.4.3 Grammars ...............................................................................................18
  • 7. Bengali Speech Recognition 7 1.4.4 Vocabularies ..........................................................................................18 1.4.5 Training ..................................................................................................18 1.4.6 Accuracy ................................................................................................18 1.4.7 Language Dictionary ...........................................................................18 1.4.8 Filler Dictionary ..................................................................................19 1.4.9 Phone ...................................................................................................19 1.4.10 HMM ........................................................................................... 19-20 1.4.11 Language Model ...............................................................................20 1.5 Overview of the Full system......................................................................................21 Chapter 2: METHODOLOGY ................................................................(22-32) 2.1 Data Preparation................................................................................................23 2.1.1 Corpus....................................................................................................23 2.1.2 Audio Files....................................................................................... 23-24 2.1.3 Dictionary Files................................................................................ 24-25 2.1.4 Phone File ......................................................................................... 25-26 2.1.5 Language Model File .lm Format ...........................................................26 2.1.6 Language Model File .DMP Format................................................. 26-27 2.1.7 Transcription File....................................................................................27 2.1.8 Fileids File ........................................................................................ 27-28 2.1.9 Filler File.................................................................................................28 2.2 Setting up The System Environment ................................................................28 2.2.1 Software Requirements...........................................................................28 2.2.2 Trainer Setup...........................................................................................28 2.2.3 Project Folder Setup.......................................................................... 39-30
  • 8. Bengali Speech Recognition 8 2.2.4 Training the Acoustic Model ..................................................................30 2.2.5 Testing Part .............................................................................................30 2.2.5.1 Testing with Pocket Sphinx ....................................................... 30-31 2.2.5.2 Testing with Sphinx4................................................................. 31-32 Chapter 3 TESTING AND PERFORMANCE EVALUATION ..........(33-38) 3.1 Testing & Performance Evaluation....................................................................34 3.2 Test Results with Pocket Sphinx........................................................................35 3.3 Test Results with Sphinx4 ................................................................................36 3.3.1 Input Type Microphone ..................................................................37 3.3.2 Input Type Audio............................................................................38 Chapter 4: Applications & Developing .......................................................(40-42) 4.1 Review of Some Developed Recognized Application..........................................41 4.1.1 Dictation Application...................................................................................41 4.1.2 Phonetic Translator......................................................................................41 4.1.3 Training File Creator.............................................................................. 41-42 4.1.4 Training File Creator....................................................................................42 Chapter 5: Limitation & Future Work ......................................................(43-44) 5.1 Limitation .............................................................................................................44 5.2 Future Work .........................................................................................................44 Chapter 6: CONCLUSION & REFERENCES ........................................(45-47) 6.1 Conclusion............................................................................................................46 6.2 References.............................................................................................................47
  • 9. Bengali Speech Recognition 9 List of Figures List of Charts Fig No. Name of figures Page No. 1.3.7 Overview of Speech Recognition System 17 1.4.10 Applying Hidden Markov Model on Speech Recognition. 20 1.5 Overview of the full System Model 21 2.1.2 Audio File Recording Format 24 2.2.5.1 Testing with Pocket Sphinx 31 2.2.5.2 Testing with Sphinx4 32 4.1.2 Dictionary files with phonetic translation. 41 4.1.3.1 Fileids files with phonetic translation. 42 4.1.4.2 Transcription File. 42 Fig No. Name of Charts Page No. 3.2.2 Experiment Results with Pocket Sphinx 35 3.3.1.2 Experimental Details with Results for Sphinx 4 Live 37 3.3.2.2 Experimental Details with Results for Sphinx 4 Audio 39
  • 10. Bengali Speech Recognition 10 List of Table No. of table Name of tables Page No. 1.2 History of Speech Recognition 15 2.2.3 Configuration of Sphinx-train.cfg 29-30 3.2.1 Experimental details with Results for Pocket Sphinx 34 3.3.1.1 Test results with Sphinx4 Input Type: Microphone 36 3.3.2.1 Test results with Sphinx4 Input Type: Audio 38 7.1 Speaker Profiles 48 7.2 Unicode to IPA Chart 49-63 7.3 Corpus About University Admission Information. 64-70 List of Abbreviation & symbols: ASR Automatic Speech recognition BSD Berkeley Software Distribution CMU Carnegie Mellon University HMM Hidden Markov Model IPA International Phonetic Alphabet CMU Principal Component Analysis ASCII American Standard Code for Information Interchange MERL Mitsubishi Electric Research Labs CRBLP Center for Research Bangla Language Processing D2P Dictionary to pronunciation SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition IDE Integrated Development Engine ABI Allied Business Intelligence
  • 11. Bengali Speech Recognition 11 ABASTRACT This report presents an overview of Automatic Speech Recognition (ASR) for our mother tongue Bangla. It begins with an introduction to speech recognition technology and then it explains how such systems work and the level of accuracy that can be expected. The object of human speech is not just a way to convey words from one person to another but also to make the other person to understand the depth of the spoken words. These systems have made dramatic performance leaps in the recent past. The aim of this project is to develop software that identifies human speech with the help of CMU sphinx Speech Recognition API.
  • 12. Bengali Speech Recognition 12 Literature Survey Today speech technology plays an important role in many applications. Speech technology has moved from research to commercial application. Many human machine interfaces have been invented and applied today in telephone food ordering system, airport information system, ticketing system, restaurant reservation system, etc. As a result, we have selected this important field for our project. On the other hand, most of the languages have a speech recognition system but our mother tongue Bangla has no proper speech recognition system this is the main reasons to select this topics. At the starting era most of the research works are done by using Artificial Neural Network (ANN), but as we are using HMM based technique so some HMM based and related research are mentioned below. Implementation of Speech Recognition System for Bangla (Shammur Absar Chowdhury-August 2010). We have studied this thesis report within one week and acquire lot of knowledge about Speech Recognition. We are really very thankful to Shammur Absar Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students who want to work in these fields. [9] Speech Recognition by Machine: A Review (M.A.Anusuya and S.K.Katti Department of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore, India) from this review we have learn lot of things about the types of Speech Recognition, approaches of speech recognition etc. [1] Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective (by Md. Abul Hasnat, Jabir Mowla and Mumit Khan- BRAC) – We have studied the past works and to the best of us knowledge this work is the first reported attempt to recognized Bangla speech using HMM Technique, so from this publication we have taken most of us suggestion about the steps to build Speech Recognition System for our report. From here we have learned how to increase the quality of audio signal given as input by noise elimination process and end detection algorithm, from this paper we have also learned that how feature of a sound is extracted and what are the parameters taken in feature files, we have also learn the algorithm for creating HMM models. [8] Bengali segmented automated speech recognition (Department of Computer Science and Engineering, BRAC University) from this thesis report we have learn about the Vowel and Consonants phonemes, Vowels and Consonants phoneme clusters, Voiced and non-voiced stops and Hidden Markov Model.[6] Recognition of Spoken Letters in Bangla (Abul Hasanat,Md. Rezaul Karim,Md. Shahidur Rahman and Md. Zafar Iqbal - SUST), Extraction of Bangla Vowel and Representation in the Vowel Space (Syed Akhter Hossain-East West, M Lutfar Rahman-Du and Farruk Ahmed- NSU), Acoustic Analysis of Bangla Consonants(Firoj Alam , S. M. Murtoza Habib and Mumit Khan) - From here We have learn the technique used to recognize letters, vowels and consonant, basically here we found out the basic steps towards a recognizer and what are the common steps to build a full functioning recognizer.[7]
  • 13. Bengali Speech Recognition 13 Chapter 1 INTRODUCTION  INTRODUCTION  HISTORY OF SPEECH RECPGNITION  TYPES OF SPEECH RECPGNITION  TERMS AND CONCEPTS
  • 14. Bengali Speech Recognition 14 1.1 Introduction Automatic Speech Recognition (ASR) in terms of machinery is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. It is a broad term which means it can recognize almost anybody’s speech and also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task. On the other hand, Speech Recognition Simply is the process of converting spoken input to text. Speech recognition is thus sometimes referred to as speech-to-text. Speech recognition, also referred to as voice recognition, is software technology that lets the user control computer functions and dictate text by voice. For example, a person can move the cursor with a voice command, such as “mouse up”. We can control application functions, such as opening a file menu and we can create a document, such as letters or reports or start media player by saying “Music”. For this reason many scientists and researchers are busy with doing works on speech recognition. Most of the languages in the world have speech recognizers of its own. But our mother tongue Bengali is not enriched with a speech recognizers. Small research works have been carried on Bengali speech recognizer, but it really does not have a great outcome. Implementing continuous speech recognizer for Bengali is our main goal throughout the project work. But developing full blown continious speech recognizer is a huge task within a short span of time. As a result we have selected a domain based continuous speech recognizer which includes a conversation on university admission process. Throughout the whole period of work, we tried to learn about different tools and we chose to use CMU Sphinx4 as speech recognition API because it’s open source software and it has high accuracy. There are many high quality and widely used software are available for this work. But these types of software are so costly and need Berkeley Software Distribution (BSD) license. The ultimate goal of ASR research is to allow a computer to recognize in real-time, with 100% accuracy, all words that are intelligibly spoken by any person, independent of vocabulary size, noise, speaker characteristics or accent. [9] 1.2 History of Speech Recognition While AT&T Bell Laboratories developed a primitive device that could recognize speech in the 1940s, researchers knew that the widespread use of speech recognition would depend on the ability to accurately and consistently perceive subtle and complex verbal input. Thus, in the 1960s, researchers turned their focus towards a series of smaller goals that would aid in developing the larger speech recognition system. As a first step, developers created a device that would use discrete speech, verbal stimuli punctuated by small pauses. However, in the 1970s, continuous speech recognition, which does not require the user to pause between words, began. This technology became functional during the 1980s and is still being developed and refined today. Speech Recognition Systems have become so advanced and mainstream that business and health care professionals are turning to speech recognition solutions for everything from providing telephone support to writing medical reports. Technological advances have made
  • 15. Bengali Speech Recognition 15 speech recognition software and devices more functional and user friendly, with most contemporary products performing tasks with over 90 percent accuracy. According to the figure provided by industry, satisfying the needs of consumers and businesses by simplifying customer interaction, increasing efficiency, and reducing operating costs, speech recognition is used in a wide range of applications. Furthermore, Allied Business Intelligence (ABI), the increased popularity of speech recognition will push revenues from $677 million in 2002 to an estimated $5.3 Billion by 2008. Indeed, recent advances in speech recognition software are creating a dynamic environment, since this technology appeals to anyone who needs or wants a hands-free approach to computing tasks. As the merger of large vocabularies and continuous recognition continues, look for more and more companies to move toward speech recognition and watch the industry take its place as a leader in the technology sector. [1] 1936 AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder. 1970 HMM approach to speech & voice recognition was invented by Lenny Baum of Princeton University. 1971 DARPA established. 1982 Dragon Systems was founded. 1984 Speech Works, the leading provider of over-the-telephone automated speech recognition (ASR) solutions, was founded. 1995 Dragon released discrete word dictation-level speech recognition software. It was the first time dictation speech & voice recognition technology was available to consumers. 1997 Dragon introduced "Naturally Speaking", the first "continuous speech" dictation software available. 1998 Microsoft invested $45 million to allow Microsoft to use speech & voice recognition technology in their systems. 2000 Lernout & Hauspie acquired Dragon Systems for approximately $460 million. 2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical, Lowers Healthcare Costs through Highly Accurate Speech Recognition. Table 1.2: History of Speech Recognition 1.3 Types of Speech Recognition Speech recognition systems can be separated in different classes by describing what types of utterances they have the ability to recognize. These classes are classified as the following: [1] 1.3.1 Isolated Words: Isolated word recognizers usually require each utterance to have quiet (lack of an audio signal) on both sides of the sample window. It accepts single words or single utterance at a time. These systems have "Listen/Not-Listen" states, where they require the speaker to wait between utterances (usually doing processing during the pauses).
  • 16. Bengali Speech Recognition 16 Isolated Utterance might be a better name for this class. Simply Isolated Words are the single words such as me, You, Go etc. 1.3.2 Connected Words: Connected word systems (or more correctly 'connected utterances') are similar to isolated words, but allows separate utterances to be 'run-together' with a minimal pause between them. Such as- I eat rice. 1.3.3 Continuous Speech: Continuous speech recognizers allow users to speak almost naturally, while the computer determines the content. (Basically, it's computer dictation). Recognizers with continuous speech capabilities are some of the most difficult to create because they utilize special methods to determine utterance boundaries. 1.3.4 Spontaneous Speech: At a basic level, it can be thought of as speech that is natural sounding and not rehearsed. An ASR system with spontaneous speech ability should be able to handle a variety of natural speech features such as words being run together, "ums" and "ahs", and even slight stutters. Based on speaker there are two type of speech recognition. Those are 1. Speaker–dependent 2. Speaker–independent 1.3.5 Speaker–dependent: Speech recognition systems that require a user to train the system to his/her voice are known as speaker-dependent systems. If you are familiar with desktop dictation systems, most are speaker dependent like IBM via Voice. Because they operate on very large vocabularies, dictation systems perform much better when the speaker has spent the time to train the system to his/her voice. Speaker–dependent software is commonly used for dictation. It works by learning the unique characteristics of a single person's voice, in a way similar to voice recognition. New users must first "train" the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software. 1.3.6 Speaker–independent: Speech recognition systems that do not require a user to train the system are known as speaker-independent systems. Speech recognition in the Voice XML word must be speaker-independent. Speaker–independent software is more commonly found in telephone applications. It is designed to recognize anyone's voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems where businesses can't ask callers to read pages of text before using the system. The downside is that speaker–independent software is generally less accurate than speaker–dependent software.
  • 17. Bengali Speech Recognition 17 1.3.7 Overview of Speech Recognition System Fig 1.3.7: Overview of Speech Recognition System 1.4 Terms and Concepts Following are the some basic terms and concepts that are fundamental to speech recognition. It is important to have a good understanding of these concepts. [9][10] 1.4.1 Utterances An utterance is something you say. It can be one word or it can be a series of words. For example, “Word”, “Microsoft Word,” or “I’d like to run Microsoft Word” are all examples of possible utterances. On the other hands, an utterance is any stream of speech between two periods of silence. Utterances are sent to the speech engine to be processed. Silence, in speech recognition, is almost as important as what is spoken, because silence delineates the start and end of an utterance. The speech recognition engine is "listening" for speech input. When the engine detects audio input in other words, a lack of silence the beginning of an utterance is signaled. Similarly, when the engine detects a certain amount of silence following the audio, the end of the utterance occurs. 1.4.2 Pronunciation You have heard the word pronunciation when it pertains to learning any language. What is pronunciation and what are some of the fundamental aspects of this important part of learning English. In any language pronunciation pertains to the sounds that are produced to make meaning. There are aspects of speech that go beyond that individual sound that makes the language unique: phrasing, stress, intonation, timing, and rhythm. Your voice is then projected to communicate what you want to say. Add to that cultural nuances, gestures and local expressions and you speak that immediately tells something about yourself to the people around you. When you are just learning a new language, it would be easy to avoid speaking in public, but that is not the best choice because you do not want to experience social isolation. It does not seem fair but people can be judged by the way they speak and can be seen as uneducated, incompetent or lack knowledge. All because the listener is reacting to the pronunciation and not what you are trying to communicate. The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text. One piece of information that the speech
  • 18. Bengali Speech Recognition 18 recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like. Words can have multiple pronunciations associated with them. For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh”. 1.4.3 Grammars Grammars define the domain, or context, within which the recognition engine works. The engine compares the current utterance against the words and phrases in the active grammars. If the user says something that is not in the grammar, the speech engine will not be able to understand it correctly. So usually speech engines have a very vast grammar. Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the Speech Recognition system. Generally, smaller vocabularies are easier for a computer to recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each entry doesn't have to be a single word. 1.4.4 Vocabularies Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR system. Generally, smaller vocabularies are easier for a computer to recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each entry doesn't have to be a single word. They can be as long as a sentence or two. Smaller vocabularies can have as few as 1 or 2 recognized utterances (e.g. “Wake Up"), while very large vocabularies can have a hundred thousand or more. 1.4.5 Training Some speech recognizers have the ability to adapt to a speaker. When the system has this ability, it may allow training to take place. An ASR (Automatic Speech Recognition) system is trained by having the speaker repeat standard or common phrases and adjusting its comparison algorithms to match that particular speaker. Training a recognizer usually improves its accuracy. Training can also be used by speakers that have difficulty speaking, or pronouncing certain words. As long as the speaker can consistently repeat an utterance, ASR systems with training should be able to adapt. 1.4.6 Accuracy The ability of a recognizer can be examined by measuring its accuracy − or how well it recognizes utterances. The performance of a speech recognition system is measurable. Perhaps the most widely used measurement is accuracy. It is typically a quantitative measurement and can be calculated in several ways. This measurement is useful in validating application design. For example, if the user said "yes," the engine returned "yes," and the "YES" action was executed, it is clear that the desired result was achieved. But what happens if the engine returns text that does not exactly match the utterance? For example, what if the user said "nope," the engine returned "no," yet the "NO" action was executed? Should that be considered a successful dialog? The answer to that question is yes because the desired result was achieved. 1.4.7 A Language Dictionary Accepted Words in the Language are mapped to sequences of sound units representing pronunciation, sometimes includes syllabification and stress.
  • 19. Bengali Speech Recognition 19 1.4.8 A Filler Dictionary Non-Speech sounds are mapped to corresponding non-speech or speech like sound units. 1.4.9 Phone Way of representing the pronunciation of words in terms of sound units. The standard system for representing phones is the International Phonetic Alphabet or IPA. English Language use transcription system that uses ASCII letters whereas Bangla uses Unicode letters 1.4.10 HMM Hidden Markov Models can be seem as finite state machines where for each sequence unit observation there is a state transition and, for each state, there is a output symbol emission. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are ``hidden'' to the outside; hence the name Hidden Markov Model. On the other hand, Hidden Markov Model (HMM) is a statistical model in which the system being modeled assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from an observation parameters. In speech recognition process, after our voice is recorded, it will be divided into many frames that we need to process in order to generate the sentence in text form. Each frame is represented as state, group of some states is represented as phoneme, and group of some phonemes is represented as word that we need to recognize. In database known as linguist model, we store the reference value of state, phoneme, and word in order to compare with the observed data (voice).By applying HMM; we construct a statistical model on each phone that its states are assigned specific possibilities in comparison with reference value. The possibility of each state depends on itself and the previous one. The goal of speech recognition system is to find out the sequence of states that has the maximum probability. Because the HMM theory is very complicated, so we don’t go very detail about that. If you want to learn more, you can see at the Appendix A
  • 20. Bengali Speech Recognition 20 Fig 1.4.10: Applying Hidden Markov Model on Speech Recognition. 1.4.11 Language Model The language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen. A language model is used to restrict word search. It defines which word could follow previously recognized words and helps to significantly restrict the matching process by stripping words that are not probable. Most common language models used are n- gram language models-these contain statistics of word sequences-and finite state language models-these define speech sequences by finite state automation, sometimes with weights.
  • 21. Bengali Speech Recognition 21 1.5 Overview of the Full System Figure 1.5 Overview of the Full System Model
  • 22. Bengali Speech Recognition 22 Chapter 2 METHODOLOGY  Data Preparation  Setting up the System Environment
  • 23. Bengali Speech Recognition 23 2.1 Data Preparation We have to make some important files that are required for training and also for testing. We have already mentioned that our project is about domain based recognition application. Domain based means a particular topic containing small amount of data. We have selected fifty sentences for our recognition application and files in below are created based on these data. The required files are  Corpus  Audio files  Dictionary file  Phone file  Language Model file .lm format  Language Model file .DMP format  Transcription file  Fileids file  Filler file 2.1.1 Corpus The Corpus is just a list of sentences that use to train the language model and simply we can tell Corpus is the collection of sentences those we are want to recognize in our machine. For our project we have also collected some important sentences according to our domain. Some sentences of our project are following… …. 2.1.2 Audio files After collecting the corpus next step is to collect the audio file of this corpus with the (.wav) or (.sph) format. During recording session the following parameters of the wave file has been maintained throughout: • Sampling rate of the audio: 16 kHz • Bit rate (bits per sample): 16 • Channel: mono (single channel)
  • 24. Bengali Speech Recognition 24 Fig 2.1.2: Audio File Recording Format For this work 16 kHz sample rate has been chosen because it provides more accurate high frequency information and 16 bit per sample will divides the element position in to 65536 possible values. After the recording, the splitting of the audio files per sentence has been done manually using recording software for our project we are using WavePad sound editor and sound file in a .wav format, where each wav file has been named by using speaker id and sentence id. For example: An audio file of our project is 01_01.wav stands as Speaker Id: 01 Sentence Id: 01 When we have collected the audio from a speaker, we have saved the personal information of this speaker like:  Name  Age  Gender  Audio collected environment Some other information like  Environmental condition of recording (for example: class room condition, number of students present, sources of noise like fan, generator’s sound etc.)  Technical details of device (pc, microphone)  Date and time of recording has also been noted down. 2.1.3 Dictionary file Simply dictionary file is the list of words which we get from our corpus file and then we need to find the pronunciation of those words such as -AA P NA KE. For this work we need software which gives us dictionary file to pronunciation file. Also a software grapheme to phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA system but we need ASCII format. As a result for our project we have developed a software D2P (Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file. Our
  • 25. Bengali Speech Recognition 25 dictionary file contains 128 words. The format of dictionary file will be (.dic). The name of our dictionary file is sbsbsr.dic and some contents of dictionary file for our project is: AA CH E AA P N AA K E AA P N I AA M I I CCHA U K …. Note:  All phonemes are in capital letter such as = AA P N I  File format is (.dic)  File encoding is utf_8 without BOM  Word can not be repeated  A blank line is required in the end of file (i.e. an extra line) 2.1.4 Phone file Phone file is the list of phoneme within words such as “AA P N I”. Here is 4 phonemes and it is a simple text file that tells a trainer what phonemes are part of the training set. The file has one phone in each line, no duplicity is allowed. This file can be generated using a small program written for this project which takes the *.dic file as input and gives *.phone file as output. For our project the file name is sbsbsr.phone. Some contents of phone file for our project is A AA B BH C CCHA CH …. Note:  All phones are in capital letter.  File format is .phone  File encoding is utf_8 without BOM  Word can not be repeated  A blank line is required in the end of file (i.e. an extra line)  Silence phoneme “SIL” also included in phone file  All phoneme in dic file are present in phone file without repetition
  • 26. Bengali Speech Recognition 26 2.1.5 Language Model file .lm format A language model assigns a probability to a piece of unseen text, based on some training data. The language model file is plain text. The format is the commonly used "arpa" format which is standard in speech recognition research. It lists 1-, 2- and 3-grams along with their likelihood (the first field) and a back-off factor (the third field). To build this file, CMU Imtoolkit is used. Imtool is a web based tool that allows users to quickly compile text-based components needed for using an ASR decoder. To do this, a corpus is needed, which in this case means a set of sentences (or more precisely, utterances) that is expected for recognition system to be able to handle.The corpus needs to be in the form of an ASCII text file but with new advanced version Unicode text file is also supported, with one sentence to a line. Upload this file, click the compile button. This will give a set of lexical (pronunciation dictionary) and language modeling files. Here the only file used is LM file as Pronunciation dictionary should be built as stated above. The tool is best for small domains. For our project the file name is sbsbsr.lm and file format is .lm. Some contents of language model file for our project is -2.0719 -0.2626 -2.0719 -0.2861 -2.0719 -0.2973 -1.7709 -0.2936 -2.0719 -0.2626 -1.7709 এ -0.2861 -2.0719 -0.2626 -2.0719 ও -0.2973 …. 2.1.6 Language Model file .DMP format We also need Language Model file .DMP format for training in sphinx4. We are using Linux environment for getting the Language Model file .DMP format from the Language Model file .lm format. We have used following commands in Linux terminal for getting the Language Model file with .DMP format: sphinx_lm_convert -i model.lm -o model.DMP Here model.lm is the name of language model file with .lm format and model.dmp is the name of language model file with .DMP format. For our project it is sphinx_lm_convert -i sbsbsr.lm -o sbsbsr.DMP
  • 27. Bengali Speech Recognition 27 2.1.7 Transcription file A transcript is needed to represent what the speakers are saying in the audio file. So in a file the dialogue of the speaker noted exactly the same precise way it has been recorded, with silence tag (starting tag <s> ending tag </s>) followed by the file ID which represent the utterance. This file is known as transcription file and basically there are two types of .transcription file. One of them is used to train the system and another is to testing. Named the files using the project name, here the name of the project is “sbsbsr”, so the train file name is sbsbsr_train.transcription and test file name is sbsbsr_test.transcription. Some contents of transcription file for our project is: <s> </s> (01_01) <s> </s> (01_02) <s> </s> (01_03) <s> </s> (01_04) …. Note:  File format is .transcription  File encoding is utf_8 without BOM  Sentence can not be repeated  A blank line is required in the end of file (i.e. an extra line) 2.1.8 Fileids file The Fileids files contain the name of all audio file without .wav or .sph extension. Two types of Fileids file, one for training and other for testing. The name of training file for our project is sbsbsr_train.fileids and the name of testing file for our project is sbsbsr_test.fileids. For Example: sbsbsr_train/sanjoy/sanjoy1/01_01 sbsbsr_ train /sanjoy/sanjoy1/01_02 sbsbsr_ train /sanjoy/sanjoy1/01_03 ……. 2.1.9 Filler file Filler file contains user’s definition of any background noise emerging in recording database and it is dictionary where a non-speech sounds are mapped to corresponding non speech sound units. This file is named as sbsbsr.filler for our project. For Example:
  • 28. Bengali Speech Recognition 28 <s> SIL <sil> SIL </s> SIL Note that the words <s>, </s> and <sil> are treated as special words and are required to be present in the filler dictionary. At least one of these must be mapped on to a phone called "SIL".  <s> symbolizes “beginning of speech”  </s> symbolizes “end of speech”  <sil> symbolizes “silence in speech” 2.2 Setting up the System Environment 2.2.1 Software Requirements We did the training part in Linux operating system. For training the recognition engine from CMU sphinx we need two software − sphinx base and sphinx train. We have collected it from CMU sphinx web site. For installing these twos oftware first we need to install some dependence software in Ubuntu distribution of Linux such as Perl and C compiler (gcc). [14] We installed these two softwares by the following commands in Linux terminal: Perl sudo apt-get install perl GCC sudo apt-get install gcc 2.2.2 Trainer Setup We already know that for setting up the trainer we need two software sphinx base and sphinx train. After downloading the software we have been decompressed it in a folder in Linux, it can be any folder. We did it in Linux root folder. After decompressing these software’s we can install them by the following commands in terminal [14]: Sphinxbase cd sphinxbase sudo ./configure sudo make sudo make install Sphinxtrain cd Sphinxtrain sudo ./configure
  • 29. Bengali Speech Recognition 29 sudo make sudo make install 2.2.3 Project Folder Setup We have created the system environment for training. We have created a project folder where sphinx train will create the trained files or acoustic model. First we need to enter to the root directory where our installed sphinx base and sphinx train folder are placed. Here we have created a folder. We gave the folder name is “sbsbsr”. After creating the folder we need to open terminal and go to created folder sbsbsr from terminal. Then we have created a project task for sphinx train by the following command in terminal: ../sphinxtrain/scripts_pl/setup_SphinxTrain.pl -task sbsbsr Executing this command from terminal will create various folder in sbsbsr such as “etc”, “wav”, “model parameters” etc. Now the time to copy files those we have created in data preparation part. We have to copy dic, filler, phone, transcript, fileids, lm, lm.dmp in “etc” folder and our collected audio into “wav” folder. Now we need to change some parameters for training in sphinx_train.cfg file created automatically in “etc” folder when creating the project task. We have changed some parameters written below in sphinx_train.cfg file: Parameter Value Before Value After $CFG_WAVFILE_EXTENSION sph wav $CFG_WAVFILE_TYPE nist/mswav/raw raw $CFG_FEATURE 2s_c_d_dd 1s_c_d_dd $CFG_FINAL_NUM_DENSITIES 8 1 $CFG_STATESPERHMM 6 3 $CFG_N_TIED_STATES 100 100 Table 2.2.3: Configuration of Sphinx-train.cfg
  • 30. Bengali Speech Recognition 30 By these changes we have finished the project folder setup. 2.2.4 Training the Acoustic Model In training process first we have to convert our collected raw speech audio data into mfc files. That’s why again we opened project directory in Linux terminal and ran the feature extraction command in terminal. Command we executed for this task is [14]: perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_train.fileids Executing this command made all *.wav files into *.mfc files in “feat” directory under project folder “sbsbsr”. Now we are ready to execute the main training command in Linux terminal that will create the acoustic model. For this task still we have to stay in project directory in terminal and execute the following command in terminal: perl scripts_pl/RunAll.pl By executing this command will train the acoustic model and we will find the trained acoustic model. The model files are placed in “model_parameters/” directory under project folder “sbsbsr”. 2.2.5 Testing part We tested our model by two recognizers from CMU sphinx. They are Pocketsphinx and Sphinx4. Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable, modifiable recognizer. 2.2.5.1Testing with Pocketsphinx First we have to download this tool from CMU Sphinx site. After downloading this tool we will go to the root directory where we have installed sphinxbase and sphinxtrain. Here we extract the downloaded pocket sphinx file. After extracting, we install this software by these following commands in Linux terminal: ./configure make After installing the software we have to go into our project folder and execute a command from terminal to make folder structure for training. For this task the command is ../pocketsphinx/scripts/setup_sphinx.pl -task sbsbsr Then we put some testing audio in “wav” directory, because pocketsphinx recognize with input as audio file. Also we have to copy test fileids and transcript file in “etc” folder. For decoding or testing our model from audio with pocket sphinx, first we need feature files of audio files. We can make this by the following command in Linux terminal:
  • 31. Bengali Speech Recognition 31 perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_test.fileids After that we execute the main command for decoding or testing in Linux terminal perl scripts_pl/decode/slave.pl Executing this command will decode the corresponding speech of input audio with the help of our trained acoustic model. We can find the result of testing in “result” folder under project folder. For our fifty sentences trained model the result is below Figure 2.2.5.1: Testing with Pocketsphinx 2.2.5.2 Testing with Sphinx4 Sphinx4 is an adjustable, modifiable recognizer written in Java. We use this sphinx4 java library to test our trained model in windows 7 operating system. We need two softwares to test our model with sphinx4 decoder. They are sphinx4 and eclipse IDE. After installing the eclipse IDE in windows we have to download sphinx4 from CMU sphinx site. After downloading sphinx4 we extract the zip file in any place in windows. Then we have to create a new java project in eclipse and make a java file with the help of demo application from CMU sphinx. After that we need to add the files shown in below from our previous project in eclipse project [11]: “sbsbsr.cd_cont_100” folder from sbsbsr/model_parmeters/ *.dic *.lm.DMP *.filler
  • 32. Bengali Speech Recognition 32 We create a cofig.xml file in eclipse project to tell the configuration to recognizer and say where the required model files are placed. We can create this cofig.xml with the help of config file in sphinx4 demo application. We need to add four java jar files js.jar, jsapi.jar, sphinx4.jar, tags.jar from sphinx4/lib directory to our project. Now our java project is ready to build and run. After building our project we run the project and can test with live voice input from microphone. For our ten sentences trained model result is Figure 2.2.5.2: Testing with Sphinx4 We can build various applications with the help of sphinx4 by using java language. We build some application using sphinx4 that will be discussed later. Chapter 3 TESTING & PERFORMANCE EVALUATION
  • 33. Bengali Speech Recognition 33 3.1 Testing and Performance Evaluation We tried to test our model in various environments such as open room, closed room, university lab room, common room etc. We have completed our testing using audio inputs of six test speaker .For the live testing we are using microphone in different environments. [9]We are completed our test using two different kinds of decoder those are: 1. Pocket Sphinx 2. Sphinx4 3.2 Test Results with Pocket Sphinx: Experiment No Details Results
  • 34. Bengali Speech Recognition 34 Experiment 01 Using Trained Data Set Number of Speaker: 5 Male: 4 Female: 1 Total Words: 1025 Correct: 975 Errors: 98 Total Percent correct = 95.12% Error = 9.56% Accuracy = 90.44% Experiment 02 Using Trained Data Set Number of Speaker: 3 Male: 3 Female: 0 Total Words: 615 Correct: 591 Errors: 39 Total Percent correct = 96.10% Error = 6.34% Accuracy = 93.66% Experiment 03 Using Trained Data Set Number of Speaker: 3 Male: 3 Female: 0 Total Words: 615 Correct: 601 Errors: 22 Total Percent correct = 97.72% Error = 3.58% Accuracy = 96.42% Experiment 04 Using Trained Data Set Number of Speaker: 5 Male: 2 Female: 3 Total Words: 1025 Correct: 938 Errors: 184 Total Percent correct = 91.51% Error = 17.95% Accuracy = 82.05% Experiment 05 Using Trained Data Set Number of Speaker: 3 Male: 0 Female: 3 Total Words: 615 Correct: 545 Errors: 125 Total Percent correct = 88.62% Error = 20.33% Accuracy = 79.67% Table 3.2.1 Experimental details with Results for Pocket Sphinx
  • 35. Bengali Speech Recognition 35 Chart 3.2.2 Experiment Results with Pocket Sphinx Average Accuracy = 88.44%
  • 36. Bengali Speech Recognition 36 3.3 Test Results with Sphinx4: 3.3.1 Input Type: Microphone Experiment No Details Results Experiment 01 User Type: Trained Number of Speaker: 2 Environment: Closed Room Speaker Type: Male Input Device: Microphone Number of Words: 156 Correct Words:154 Errors: 2 Percent of Correct: 98% Errors: 2% Accuracy: 98% Experiment 02 User Type: Untrained Number of Speaker: 3 Environment: Lab Room Speaker Type: Male Input Device: Microphone Number of Words: 130 Correct Words: 120 Errors:10 Percent of Correct: 90% Errors: 10% Accuracy: 90% Experiment 03 User Type: Untrained Number of Speaker: 3 Environment: University Campus Speaker Type: Male Input Device: Microphone Number of Words: 140 Correct Words: 122 Errors:18 Percent of Correct: 82% Errors: 18% Accuracy: 82% Experiment 04 User Type: Untrained Number of Speaker: 2 Environment: University Campus Speaker Type: Female Input Device: Microphone Number of Words: 108 Correct Words: 95 Errors:13 Percent of Correct: 87% Errors: 13% Accuracy: 87% Experiment 05 User Type: Trained Number of Speaker: 3 Environment: University floor Speaker Type: Female Input Device: Microphone Number of Words: 126 Correct Words: 115 Errors:11 Percent of Correct: 89% Errors: 11% Accuracy: 89% Experiment 06 User Type: Trained Number of Speaker: 2 Environment: Closed Room Speaker Type: Male Input Device: Microphone Number of Words: 128 Correct Words: 127 Errors:1 Percent of Correct: 99% Errors: 1% Accuracy: 99% Table 3.3.1.1 Experimental Details with Results for Sphinx 4 Live
  • 37. Bengali Speech Recognition 37 Chart 3.3.1.2 Experiment Results with Sphinx-4 Live Input Average Accuracy = 90.83%
  • 38. Bengali Speech Recognition 38 3.3.2 Input Type: Audio Experiment No Details Results Experiment 01 Using Trained Data Set Number of Speaker: 3 Male: 3 Female:0 Total Words: 210 Correct: 194 Errors: 16 Total Percent correct = 85.71% Error = 15.71% Accuracy = 85.71% Experiment 02 Using Trained Data Set Number of Speaker: 3 Male: 2 Female:1 Total Words: 210 Correct: 188 Errors: 22 Total Percent correct = 77.14% Error = 22% Accuracy = 77.14% Experiment 03 Using Trained Data Set Number of Speaker: 3 Male: 2 Female:1 Total Words: 210 Correct: 182 Errors: 28 Total Percent correct = 72.38% Error = 28% Accuracy = 72.38% Experiment 04 Using Trained Data Set Number of Speaker: 3 Male: 2 Female:1 Total Words: 210 Correct: 194 Errors: 16 Total Percent correct = 84.44% Error = 16% Accuracy = 84.44% Experiment 05 Using Trained Data Set Number of Speaker: 3 Male: 1 Female:2 Total Words: 210 Correct: 184 Errors: 26 Total Percent correct = 74.76% Error = 26% Accuracy = 74.76% Table 3.3.2.1 Experimental Details with Results for Sphinx 4 Audio
  • 39. Bengali Speech Recognition 39 Chart 3.3.2.2 Experiment Results with Sphinx-4 Audio Input Average Accuracy = 78.88%
  • 40. Bengali Speech Recognition 40 Chapter 4 APPLICATION & DEVELOPING  Reveiw of Developed Application
  • 41. Bengali Speech Recognition 41 4.1 Review of Some Developed Recognition Application We developed four applications.They are dictation applications, phonetic translator, training file creator and desktop command type application. 4.1.1 Dictation Application We write this application with the help sphinx4 demo application. The main objective of this application is to recognize sentences. Actually this is our main objective in this project. 4.1.2 Phonetic Translator For training the acoustic model we need a file called "*.dic". In this file all training words and their pronunciation are placed. We have made these pronunciations several times when training acoustic model experimentally. As time goes on we think about an automatic pronunciation or phonetic translation maker and this software is the implementation of that thinking. First we made a database where all phonemes and their corresponding letters are stored. We took help from IPA chart and various thesis papers [1] [9] [10] to make this database. However, various sources define phonemes in different ways. Even all letters’ phoneme is not defined. That’s why we personally define some phonemes for some consonant and conjunct letters. After making this database we made phonetic translator to give phonetic translation of Bengali words with the help of our created database. Fig 4.1.2: Dictionary files with phonetic translation. 4.1.3 Training File Creator For training the acoustic model we also need fileids and transcript file. These two files contain information about training audio file paths and their corresponding sentences. Before creating
  • 42. Bengali Speech Recognition 42 this program we have to make these two files manually. But after creating our software we can make these two big files automatically within moments. For example, if we have 8000 Fig 4.1.3.1: Fileids files with phonetic translation. audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and their’s corresponding sentences in transcript file. But now we can create these two files automatically if we provide root folder name of audio file and sentence corpus file to this software as input. Fig 4.1.3.2: Transcription File. 4.1.4 Command Application We make a simple voice command application. By using Bengali word as voice command this application do some common task such as opening my computer, left click, right click etc.
  • 43. Bengali Speech Recognition 43 Chapter 5 LIMITATION & FUTURE WORK  LINITATION  FUTUR WORK
  • 44. Bengali Speech Recognition 44 Limitation In our project we have some limitation in some specific tasks. The system of our Project has been built on small data for time consistency. We have selected a domain about University admission information for new comer students with 185 sentences. But it was difficult to collect lots of audio from 16 speakers with a short span of time. As a result, we have selected 50 sentences from 185 sentences for training. But more speakers are needed for getting more accurate results. For creating dictionary file we have also faced some problems.Because our Bengali phoneme list is not declared accurately and we don’t know exact number of phonemes in our Bengali language as different researchers said about different number of phonemes. The performance of system depends on speaker pronunciation, environment and microphone. It recognizes the sentences accurately when speaker speak the sentences loudly and clearly and sometimes it cannot recognize the sentences accurately because of slowly speaking and pronunciation problem. We created a program for automatically generated pronunciation of a Bengali word. But this software is not working properly because of encoding problem. As accurate phoneme is a prerequisite for good pronunciation, that’s why if we have accurate number of phonemes then we can hope a good output from this software. Future Works We have done implementation of Bengali Speech Recognition for small data size. In future we will increase our data size for creating a complete model and we have a plan to increase its capability to recognize speech more accurately and enhance its vocabulary. We also have developed software for making dictionary file, fileids file and transcription file. We want to make a user friendly stand-alone GUI application for writing Bengali language. We also have an intention to develop a complete desktop command type application for Bengali. Still training and creating a model depend on developer that’s why we have a plan to make an automatic trainer that can be used by a normal user. By using this automatic trainer, user will be able to train any sentence with the corresponding audio. We also want to integrate this system to various document type applications for writing Bengali sentence by just uttering the sentence. We want to make voice respond type application. It will work like a user asking the software to give answer of his question and software will give the predefined answer of this question. For making a good recognition application we need a lot of audio. So we want to develop a website for collecting audio from people. From this website we will be able to collect audio and using this audio we will enrich our recognition application.
  • 45. Bengali Speech Recognition 45 Chapter 6 C ONCLUSION & REFERENCES  CONCLUSION  REFERENCES
  • 46. Bengali Speech Recognition 46 Conclusion Speech is the primary and the most convenient means of communication between people. Lot of research in the field of ASR is being carried out for English, Hindi, Urdu, Arabic, Japanese languages and so on. But in our mother tongue Bengali is still in beginner level in this field. So we tried to learn about this field and to develop some tools to recognize Bengali language. We tried to discuss about our objectives, various tools we used and process of speech recognition through this whole report. But our developed tools are in preliminary level. For making good and complete recognition application, lots of improvement required such as we need a big training database, lots of speakers, audio with low noise etc. Still no speech recognizer is 100% accurate. But if we can improve the requirements of a good recognizer and can train our system more accurately then the result of the system will be enough to achieve our goal.
  • 47. Bengali Speech Recognition 47 References [1]. M.A.Anusuya & S.K.Katti, “Speech Recognition by Machine: A Review” (IJCSIS) International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009. [2]. Morched Derbali, MU Tasem Jarrah, Mohd Taib Wahid “A Review of Speech Recognition with Sphinx Engine in Language Detection” Journal of Theoretical and Applied Information Technology, Vol. 40 No.2, 2005 - 2012. [3]. L. Rabiner & B. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993. [4]. Daniel Jurafsky and James H.Martin, “An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition”, Prentice Hall, 2000. [5]. L.R. Rabiner and R.W. Schafer, “Digital Processing of Speech Signal”, Prentice Hall, 1978. [6]. A K M Mahmudul Hoque, “Bengali Segmented Automatic Speech Recognition”, BRACU, 2006. [7]. Abul Hasanat Md. Rezaul Karim, Md. Shahidur Rahman and Md. Zafar Iqbal, “Recognition of Spoken Letters in Bangla”, banglacomputing.net, 2002. [8]. Md. AbulHasnat, Jabir Mowla, Mumit Khan, “Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective”, BRACU, 2007. [9]. Shammur Absar Chowdhury, “Implementation of Speech Recognition System for Bangla”, BRACU, August 2010. [10]. Qqbal S/O Shahzad, “Speech Recognition System”, Iqra University, March 2009. [11]. Tran Viet Khai,”Sphinx4 Adaptation to Vietnamese Language, Vietnamese Automatic Digit Recognition”, Bo Xuan Tu, Hochiminh city,Vietnam, 2008. [12]. Sadaoki Furui, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech”, IEEE, 2004. [13]. M. S. Islam, “Research on Bangla Language Processing in Bangladesh: Progress and Challenges”, BUET, 2009. [14]. P. Foster, T. Schalk, “Speech Recognition: The Complete Practical Reference Guide”, 1993. ISBN: 0936648392. [15]. H. Satori, M. Harti and N. Chenfour, “Introduction to Arabic Speech Recognition Using CMUS Sphinx System”, 2007.
  • 48. Bengali Speech Recognition 48 Fig 7.1: Speaker Profiles Speaker ID Name Age Gender District Environment Institution/Other 02 Bappy 23 Male Sylhet Closed Room Leading University 03 Bijoy 23 Male Moulovibazar Closed Room M.C. College 04 Dola 24 Female Chittagong Department Leading University 05 Falguny 24 Female Sylhet Open Space Leading University 06 Lovely 20 Female Sylhet Class Room Lotifa Shofi Chowdhury Mohila College 07 Mazed 23 Male Moulovibazar Closed Room Leading University 08 Moni 23 Female Sylhet Lab Leading University 09 Pinku 23 Male Sylhet Closed Room Sylhet Govt. College 10 Polash 24 Male Feni Cafeteria Leading University 11 Pritom 20 Male Sylhet Closed Room Leading University 12 Razib 23 Male Sylhet Cafeteria Leading University 13 Rumi 22 Female Sylhet Lab Leading University 14 Sanjoy 23 Male Sylhet Closed Room Leading University 15 Shimul 23 Male Sylhet Closed Room Leading Univerity 16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon College APENDICES Speaker Profile
  • 49. Bengali Speech Recognition 49 Unicode to IPA Chart Bangla Pnoneme (ব্যজ্ঞনবর্ণ) IPA ক K খ KH গ G ঘ GH ঙ NG চ C ছ CH জ J ঝ JH ঞ NIO ট T ঠ TH ড D ঢ DH ণ N ত TA থ TO দ DA ধ DO ন N প P ফ PH
  • 50. Bengali Speech Recognition 50 ব B ভ BH ম M য Z র R ল L শ SH ষ SH স S হ H ড় RA ঢ় RH য় Y ৎ T`` NG : ^ Bangla Pnoneme IPA AA I II U UU RI
  • 51. Bengali Speech Recognition 51 E OI O OU Bangla Pnoneme ( ) IPA ব- W য- Y র- R ম- M RR Bangla Pnoneme (নাম্বার) IPA শুন্য 0 এক 1 দুই 2 তিন 3 চার 4 পাঁচ 5 ছয় 6 সাত 7 আট 8 নয় 9
  • 52. Bengali Speech Recognition 52 Bangla Pnoneme (যুক্তবর্ণ) IPA KK KT KT KTR KW KM KY KR KL KKH KKHW KKHN KKHM KKHY KS KHY KHR GN GDH NGM CC CCH CCHW CCHR CNG
  • 60. Bengali Speech Recognition 60 MM MY MR ML ZY , , RRK , RRKY LK LG SHTY SHTR SHTH SHTHY SHN SHP SHPR SPHY SHW SHM SK SKR ST STR SKH ST STW
  • 61. Bengali Speech Recognition 61 STY STH STHY SN SP Corpus Our total Sentences is 185 but we have recognized 50 sentences for short time duration. No Sentence Fig 7.2: Unicode to IPA Chart
  • 62. Bengali Speech Recognition 62 1 শুভ সকাল 2 ধন্যবাদ 3 আমি আপনাকে কি সাহায্য করতে পারি 4 আমি কিছু তথ্য জানতে এসেছি 5 কি বলুন 6 ভর্তি বিষয়ে 7 এএএএ কোন বিভাগে ভর্তি হতে ইচ্ছুক 8 কম্পিউটার বিজ্ঞান এ এএএএএএএএ বিভাগে 9 এ বিভাগে ভর্তি চলছে 10 এ বিভাগে কি কি সুবিধা আছে 11 বিভিন্ন ধরনের ল্যাব সুবিধা আছে 12 যেমন 13 দুএএ কম্পিউটার ল্যাব আছে 14 ও আচ্ছা 15 একটি শুধু বিজ্ঞান বিভাগের জন্য 16 আর আরেকটি 17 সব এএএএএএএ জন্য 18 প্রতি এএএএএএ কতগুলো কম্পিউটার আছে 19 বত্রিশটি করে 20 আর কিছু 21 কতজন শিক্ষক আছেন এ বিভাগে 22 প্রায় এএএএ জন 23 মোট কতজন ছাত্রছাত্রী এ এএএএএএ 24 প্রায় এএএএএ জন 25 এ বিভাগেএ এএএএএএএএ এএএ এএ 26 রুমেল এম এস রাহমান পীর 27 বিশ্ব বিদ্যালয়ের প্রতিষ্ঠাতা কে জানতে পারি 28 অবশ্যই 29 দানবীর মিস্টার রাগিব আলী 30 আপনাদের কি আর কোন শাখা আছে 31 না সিলেট এই একমাত্র ক্যাম্পাস 32 এএএএএএএএএএএএএ কবে স্থাপিত হয়েছে 33 এএএ এএএএএ এএ সালে 34 আর কি কি সুবিধা আছে 35 হার্ডওয়্যার সার্কিট ও রসায়নের ল্যাব আছে 36 লাইব্রেরী কি আছে 37 অবশ্যই একটা বড় লাইব্রেরী আছে 38 ক্যান্টিন আছে 39 খুবই উন্নতমানের একটি ক্যান্টিনও আছে
  • 63. Bengali Speech Recognition 63 40 ভর্তির শেষ তারিখ কবে 41 এ মাসের পাঁচ তারিখ 42 ক্লাস শুরু হবে এ মাসের দশ তারিখ হতে 43 কোন কোন তলা নিয়ে বিশ্ব বিদ্যালয় ক্যাম্পাস 44 তিন চার এএএ পাঁচ এএএ নিয়ে 45 আপনাদের বএরে কত সেমিস্টার 46 তিন সেমিস্টার 47 তাহলে তো মোট এএএ সেমিস্টার 48 জি হ্যা 49 এ বিভাগে মোট কত ক্রেডিট পড়ানো হয় 50 এএএএ এএএএএএএ ক্রেডিট 51 ইউ 52 53 কত 54 55 কত 56 উপর 57 58 এক 59 60 আর 61 কত 62 এক 63 64 65 আর 67 ও 68 69 কত 70 এক 71 কত 72 73 রকম 74 75 আর 76 ও
  • 64. Bengali Speech Recognition 64 77 সব একশত 78 79 উপর 80 81 আর কত 82 এ আর এ 83 এ 84 85 এক 86 87 আর সবসময় 88 89 ও এ 90 91 এ আর এ 92 এ আর এ 93 94 আরকম 95 আর এ 96 97 আর 98 99 100 101 আর 102 103 104 105 106 107 আরও 108 109 সব
  • 65. Bengali Speech Recognition 65 110 111 112 113 ও সব 114 115 হয় 116 117 118 আর 119 -ই হয় 120 121 হয় 122 123 ও হয় 124 125 - 126 হয় 127 128 129 এ হয় 130 সব 131 132 133 134 135 136 ওহ 137 হয় 138 হয় 139 বছর হয় 140 141 142 143 144 হয়
  • 66. Bengali Speech Recognition 66 145 এখনও 146 147 148 149 , 150 151 152 153 একর 154 155 156 157 158 159 ভবন 160 161 162 163 164 165 একবৎসর 166 167 168 169 সময় 170 171 এখন 172 পর 173 174 পর 175 176 পর 177 এক পর
  • 67. Bengali Speech Recognition 67 178 179 ও 180 181 182 183 184 185 Fig 7.3: Corpus about University Admission Information.
  • 68. Bengali Speech Recognition 68 CODE OF OUR PROJECT: package sbs.BSR.training.files.creator; import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileWriter; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List; public class FileidsCreator { static List<String> dirTreeLevel1= new ArrayList<String>(); static List<String> dirTreeLevel2= new ArrayList<String>(); static List<String> dirTreeLevel3= new ArrayList<String>(); static List<String> tempList = new ArrayList<String>(); public static void main(String[] args) { SortArrayList sortObj = new SortArrayList(); int lineCounts = 0; String dirTreeRootName = "C:/trainnign_file/sbs_asr_train/"; String root = getRootFromPath(dirTreeRootName); listdir(dirTreeRootName,1); Collections.sort(dirTreeLevel1); int sizeOfdirTreeLevel1 = dirTreeLevel1.size(); int i = 0,j=0,k=0; while(sizeOfdirTreeLevel1>i) { String path2 = dirTreeRootName+dirTreeLevel1.get(i); dirTreeLevel2.clear(); listdir(path2,2); dirTreeLevel2 = sortObj.sortList(dirTreeLevel2); int sizeOfdirTreeLevel2 = dirTreeLevel2.size(); String path3=""; while(sizeOfdirTreeLevel2>j) { path3 = path2+"/"+dirTreeLevel2.get(j)+"/"; listdir(path3,3); while(dirTreeLevel3.size()>k) { String targetPath = root+"/"+dirTreeLevel1.get(i)+"/"+dirTreeLevel2.get(j)+"/"+dirTreeLevel3.get(k);
  • 69. Bengali Speech Recognition 69 targetPath = targetPath.replaceAll(".wav", ""); writIntoFile(targetPath,"C:/trainnign_file/sbs_asr_train/"+"sbs_asr_train.fileids"); lineCounts++; k++; } j++; } //file listing end j=0; i++; } transcriptCreator tcObj = new transcriptCreator(); try { tcObj.readCorpus("C:/trainnign_file/corpus.txt"); tcObj.CreateTransCriptFile(dirTreeLevel3, "C:/trainnign_file/sbs_asr_train/sbs_asr_train.transcript"); } catch (FileNotFoundException e) { e.printStackTrace(); } int si=0; } public static void listdir(String path,int Level) { File folder = new File(path); File[] listOfFiles = folder.listFiles(); int numofL_I = 0; int numOfL = listOfFiles.length; while(numOfL>numofL_I) { if (listOfFiles[numofL_I].isDirectory()) { if(Level == 1) { dirTreeLevel1.add(listOfFiles[numofL_I].getName()); } else if(Level == 2) { dirTreeLevel2.add(listOfFiles[numofL_I].getName()); } } else {
  • 70. Bengali Speech Recognition 70 if (listOfFiles[numofL_I].isFile()) { dirTreeLevel3.add(listOfFiles[numofL_I].getName()); } } numofL_I++; } } public static String getRootFromPath(String UserDir) { String root = null; int count = 0; int[] indexes = new int[2]; int i = 0; i = indexes[1] = UserDir.lastIndexOf('/'); i=i-1; while(i>0) { if(UserDir.charAt(i)=='/') { indexes[0] = i; break; } i--; } root = UserDir.substring(indexes[0]+1, indexes[1]); return root; } public static void writIntoFile(String data,String path) { try{ FileWriter fstream = new FileWriter(path,true); BufferedWriter out = new BufferedWriter(fstream); out.write(data+'n'); out.close(); }catch(IOException e){ e.printStackTrace(); } } } ……………………………………………………………………………………………
  • 71. Bengali Speech Recognition 71 ARRAY: …………………………………………………………………………………………… package sbs.BSR.training.files.creator; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.List; public class SortArrayList{ public List<String> sortList(List<String> unsortList){ List<String> mysortList = new ArrayList<String>(); int i = 0; while(unsortList.size()>i) { String str = unsortList.get(i); str = str.replaceAll("[^d.]", ""); mysortList.add(str); i++; } int[] sortint = new int[mysortList.size()]; i = 0; while(unsortList.size()>i) { sortint[i] = Integer.valueOf(mysortList.get(i)); i++; } Arrays.sort(sortint); String folNameWONum = unsortList.get(0).replaceAll("[^a-z ^A-Z]",""); mysortList.clear(); i = 0; while(unsortList.size()>i) { String requiredString = folNameWONum+String.valueOf(sortint[i]); mysortList.add(requiredString); i++; } return mysortList; }
  • 72. Bengali Speech Recognition 72 } ……………………………………………………………………………………………… TRANSCRIPT CREATOR package sbs.BSR.training.files.creator; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileWriter; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List; public class transcriptCreator { static ArrayList<Object> inputLines=new ArrayList<Object>(); public void readCorpus(String path) throws FileNotFoundException { FileInputStream fstream = new FileInputStream(path); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; try{ while ((strLine = br.readLine()) != null) { inputLines.add(strLine); } in.close(); }catch (Exception e){//Catch exception if any System.err.println("Error: " + e.getMessage()); } int inputLineSize = inputLines.size(); int i = 0; while(inputLineSize>i) { i++; } } public static void CreateTransCriptFile(List<String> data,String path) { try{
  • 73. Bengali Speech Recognition 73 FileWriter fstream = new FileWriter(path,true); int numOfwriting = data.size(); int i = 0; int lineStart = 0; int lineEnd = inputLines.size(); BufferedWriter out = new BufferedWriter(fstream); while(numOfwriting>i) { if(lineStart == lineEnd) lineStart = 0; String wavRemove = data.get(i).toString(); wavRemove = wavRemove.replaceAll(".wav", ""); String leadTrailspaceRemoved = inputLines.get(lineStart).toString(); leadTrailspaceRemoved = leadTrailspaceRemoved.trim(); String pattern = "<S> "+leadTrailspaceRemoved+" </S> ("+wavRemove+")"; out.write(pattern+'n'); lineStart++; i++; } //Close the output stream out.close(); }catch(IOException e){ e.printStackTrace(); } } } ……………………………………………………………………………………………. FILE OPERATOR package ptpack; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.InputStreamReader; import java.util.ArrayList; public class FileOperator { @SuppressWarnings("null")
  • 74. Bengali Speech Recognition 74 public ArrayList<Object> getStrings() { ArrayList<Object> allInputStrings=new ArrayList<Object>(); int aisI = 0; try { FileInputStream fstream = new FileInputStream("C:/trainnign_file/testInput.dic"); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String str; while ((str = br.readLine()) != null) { str = str.trim(); allInputStrings.add(str); System.out.println(str.trim()+" "+str.length()); } in.close(); } catch (Exception e) { System.err.println(e); } return allInputStrings; } public void createFile(String finalData) { try { BufferedWriter out = new BufferedWriter(new FileWriter("C:/trainnign_file/sbs_asr_train4.dic")); out.write(finalData); out.close(); } catch (Exception e) { System.err.println(e); } } } ………………………………………………………………………………………….. PHONETIC TRANSLATION package ptpack; import java.util.ArrayList; public class PhoneticTranslation {
  • 75. Bengali Speech Recognition 75 public static void main(String[] args) { PronounciationGenarator pgObj = new PronounciationGenarator(); FileOperator foObj = new FileOperator(); ArrayList<Object> inputStrings = new ArrayList<Object>(); inputStrings = foObj.getStrings(); String pro = ""; System.out.println("in phonetic translation"); int i = 0; String is = ""; String fileImage = ""; while(inputStrings.size()>i) { is = inputStrings.get(i).toString().trim(); pro = pgObj.getPronouciation(is); pro = pro.trim(); fileImage = fileImage+is+" "+pro+"n"; i++; } //System.out.println(fileImage); foObj.createFile(fileImage); }//main end } Prodb package ptpack; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class Prodb { private static final String DBURL = "jdbc:mysql://localhost:3306/bsr?user=root&password=" + "&useUnicode=true&characterEncoding=UTF-8"; private static final String DBDRIVER = "com.mysql.jdbc.Driver"; static { try { Class.forName(DBDRIVER).newInstance(); } catch (Exception e){
  • 76. Bengali Speech Recognition 76 e.printStackTrace(); } } private static Connection getConnection() { Connection connection = null; try { connection = DriverManager.getConnection(DBURL); } catch (Exception e) { e.printStackTrace(); } return connection; } public static void showEmployee() { Connection con = getConnection(); Statement stmt =null; try { stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("Select * from employees " + "where EmployeeID=1001"); if (rs.next()) { System.out.println("EmployeeID : " + rs.getInt("EmployeeID")); System.out.println("Name : " + rs.getString("Name")); System.out.println("Office : " + rs.getString("Office")); } else { System.out.println("No Specified Record."); } rs.close(); } catch(SQLException ex) { System.err.println("SQLException: " + ex.getMessage()); } finally { if (stmt != null) { try { stmt.close(); } catch (SQLException e) { System.err.println("SQLException: " + e.getMessage()); } } if (con != null) { try {
  • 77. Bengali Speech Recognition 77 con.close(); } catch (SQLException e) { System.err.println("SQLException: " + e.getMessage()); } } } } public static boolean isBanjonBorno(char ch) { Connection con = getConnection(); Statement stmt =null; int id=0; boolean isBBorno = false; try { stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("Select id from banglatab " + "where letter= '"+ch+"'"); if (rs.next()) { id = rs.getInt("id"); if(id>=13 && id<=48) { isBBorno = true; } } else { System.out.println("No Specified Record."); } rs.close(); } catch(SQLException ex) { System.err.println("SQLException: " + ex.getMessage()); } finally { if (stmt != null) { try { stmt.close(); } catch (SQLException e) { System.err.println("SQLException: " + e.getMessage()); } } if (con != null) { try { con.close(); } catch (SQLException e) { System.err.println("SQLException: " + e.getMessage()); }
  • 78. Bengali Speech Recognition 78 } } return isBBorno; } }//class end …………………………………………………………………………………………… PRONOUNCIATION GENARATOR package ptpack; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class PronounciationGenarator { Prodb pdobj = new Prodb(); String BanglaWord = ""; int BanglaWordLength = 0; int ConjunctsPosition[] = new int[20]; int NoConjunctsPosition[] = new int[20]; int cpi=0,ncpi=0,k=0; char ConjuctsIdentifyCharacter = ' '; int calltimes = 1; public String getPronouciation(String bw) { BanglaWord = bw; BanglaWordLength = BanglaWord.length(); while(k<BanglaWordLength) { k++; } k=0; while(BanglaWordLength>k) { if(BanglaWord.charAt(k)==ConjuctsIdentifyCharacter) { ConjunctsPosition[cpi] = k-1; cpi++;
  • 79. Bengali Speech Recognition 79 ConjunctsPosition[cpi] = k; cpi++; ConjunctsPosition[cpi] = k+1; cpi++; } k++; } int NoOfConjuctsPosition = cpi; k = 0; //System.out.println("nConjunctsPosition"); while(cpi>k) { //System.out.print(ConjunctsPosition[k]+" "); k++; } //int wc[] = {0,1,2,4,5,6}; int i=0,trace=0; cpi = 0; ncpi = 0; while(BanglaWordLength>i) { while(NoOfConjuctsPosition>cpi) { if(ConjunctsPosition[cpi]==i) { trace = 1; break; } cpi++; } if (trace==0) { NoConjunctsPosition[ncpi]=i; ncpi++; } cpi=0; i++; trace = 0; } i=0; while(ncpi>i) { i++; } // matching serially and making conjuct
  • 80. Bengali Speech Recognition 80 i = 0; cpi = 0; ncpi = 0; int ai = 0; String SearchStrings[] = new String[100]; int ssi = 0; int serialityTrace =0; String tempString = ""; boolean tempc,tempc2; while(BanglaWordLength>i) { while(NoOfConjuctsPosition>cpi) { if(ConjunctsPosition[cpi]==i) { trace = 1; break; } cpi++; } if (trace==0)// if nonconjuct { SearchStrings[ssi] = Character.toString(BanglaWord.charAt(i)); ssi++; if(BanglaWordLength!=(i+1)) { calltimes++; tempc = pdobj.isBanjonBorno(BanglaWord.charAt(i)); tempc2 = pdobj.isBanjonBorno(BanglaWord.charAt(i+1)); if(tempc==true && tempc2==true) { SearchStrings[ssi] = Character.toString('অ'); ssi++; } } } else // if conjuct { tempString = Character.toString(BanglaWord.charAt(i)); if(BanglaWord.charAt(i)=='র') { SearchStrings[ssi] = Character.toString(BanglaWord.charAt(i)); ssi++; i+=2;
  • 81. Bengali Speech Recognition 81 SearchStrings[ssi] = Character.toString(BanglaWord.charAt(i)); ssi++; } else { while(NoOfConjuctsPosition>serialityTrace) { if(ConjunctsPosition[serialityTrace]==i) { break; } serialityTrace++; } int diffbi = Math.abs(ConjunctsPosition[serialityTrace]- ConjunctsPosition[serialityTrace+1]); System.out.println("BanglaWord = "+BanglaWord+" "+BanglaWord.length()); while(diffbi<=1) { System.out.println("diffbi = "+diffbi+" "+" serialityTrace "+serialityTrace); i++; System.out.println("i = "+i+" BanglaWord.charAt(i) "+BanglaWord.charAt(i)); tempString += Character.toString(BanglaWord.charAt(i)); serialityTrace++; diffbi = Math.abs(ConjunctsPosition[serialityTrace]- ConjunctsPosition[serialityTrace+1]); } SearchStrings[ssi] = tempString; ssi++; } }//conjuct adding end cpi=0; i++; trace = 0; } i=0; while(ssi>i) { i++; } String phoneticTrans = ""; Connection conn = null;
  • 82. Bengali Speech Recognition 82 Statement stmt = null; ResultSet rs = null; try { Class.forName("com.mysql.jdbc.Driver").newInstance(); String connectionUrl = "jdbc:mysql://localhost:3306/bsr?useUnicode=yes&characterEncoding=UTF-8"; String connectionUser = "root"; String connectionPassword = ""; conn = DriverManager.getConnection(connectionUrl, connectionUser, connectionPassword); stmt = conn.createStatement(); i=0; while (ssi>i) { rs = stmt.executeQuery("SELECT pro FROM banglatab where letter = '"+SearchStrings[i]+"'"); rs.next(); String pro = rs.getString("pro"); phoneticTrans = phoneticTrans+pro+" "; i++; } rs.close(); } catch (Exception e) { e.printStackTrace(); } finally { try { if (rs != null) rs.close(); } catch (SQLException e) { e.printStackTrace(); } try { if (stmt != null) stmt.close(); } catch (SQLException e) { e.printStackTrace(); } try { if (conn != null) conn.close(); } catch (SQLException e) { e.printStackTrace(); } } return phoneticTrans; } } …………………………………………………………………………………………… SBSASR_MAIN package SBS.BSR.S50; import org.omg.CORBA.portable.InputStream; import org.omg.CORBA.portable.OutputStream;
  • 83. Bengali Speech Recognition 83 import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; public class SBSBSR { public static void main(String[] args) { ConfigurationManager cm; if (args.length > 0) { cm = new ConfigurationManager(args[0]); } else { cm = new ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml")); } // allocate the recognizer System.out.println("Loading..."); Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); recognizer.allocate(); // start the microphone or exit the program if this is not possible Microphone microphone = (Microphone) cm.lookup("microphone"); if (!microphone.startRecording()) { System.out.println("Cannot start microphone."); recognizer.deallocate(); System.exit(1); } printInstructions(); //giveCommand(); // loop the recognition until the programm exits. String comString = " "; System.out.println("comString: " + comString + " length : "+comString.length()+'n'); while (true) { System.out.println("Start speaking. Press Ctrl-C to quit.n"); Result result = recognizer.recognize(); if (result != null) { String resultText = result.getBestResultNoFiller(); System.out.println("You said: " + resultText +'n'); } else {
  • 84. Bengali Speech Recognition 84 System.out.println("I can't hear what you said.n"); } } } /** Prints out what to say for this demo. */ private static void printInstructions() { System.out.println("Sample sentences:n" + " n" + " n" + " n" + " nn"); } } Sbsbsr Transcriber package SBS.BSR.S50; import edu.cmu.sphinx.frontend.util.AudioFileDataSource; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; import javax.sound.sampled.UnsupportedAudioFileException; import java.awt.AWTException; import java.io.File; import java.io.IOException; import java.net.URL; public class Transcriber { public static void main(String[] args) throws IOException, UnsupportedAudioFileException, AWTException { URL audioURL; if (args.length > 0) { audioURL = new File(args[0]).toURI().toURL(); } else { audioURL = Transcriber.class.getResource("sanjoy_falguni_dola_s20.wav");
  • 85. Bengali Speech Recognition 85 } URL configURL = Transcriber.class.getResource("sbsbsr_transcriber_config.xml"); ConfigurationManager cm = new ConfigurationManager(configURL); Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); /* allocate the resource necessary for the recognizer */ recognizer.allocate(); // configure the audio input for the recognizer AudioFileDataSource dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource"); dataSource.setAudioFile(audioURL, null); // Loop until last utterance in the audio file has been decoded, in which case the recognizer will return null. Result result; while ((result = recognizer.recognize())!= null) { String resultText = result.getBestResultNoFiller(); System.out.println(resultText); } } } Desktop Command Application package SBS.BSR.CMD.APP.S12; import java.awt.AWTException; import java.awt.Robot; import java.awt.event.InputEvent; import java.awt.event.KeyEvent; import java.io.IOException; public class CommandActivator { public void leftClick() throws AWTException{
  • 86. Bengali Speech Recognition 86 Robot robot = new Robot(); robot.mousePress(InputEvent.BUTTON1_MASK); robot.mouseRelease(InputEvent.BUTTON1_MASK); } /* public void rightClick() throws AWTException{ Robot robot = new Robot(); robot.mousePress(InputEvent.BUTTON3_MASK); robot.mouseRelease(InputEvent.BUTTON3_MASK); }*/ public void doubleClick() throws AWTException{ Robot robot = new Robot(); robot.mousePress(InputEvent.BUTTON1_MASK); robot.mouseRelease(InputEvent.BUTTON1_MASK); robot.mousePress(InputEvent.BUTTON1_MASK); robot.mouseRelease(InputEvent.BUTTON1_MASK); } public void copy() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_C); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_C); } public void paste() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_V); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_V); } public void delete() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_DELETE); robot.keyRelease(KeyEvent.VK_DELETE); } public void selectAll() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_A); robot.keyRelease(KeyEvent.VK_CONTROL);
  • 87. Bengali Speech Recognition 87 robot.keyRelease(KeyEvent.VK_A); } public void up() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_PAGE_UP); robot.keyRelease(KeyEvent.VK_PAGE_UP); } public void down() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_PAGE_DOWN); robot.keyRelease(KeyEvent.VK_PAGE_DOWN); } public void previousPage() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_PAGE_UP); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_PAGE_UP); } public void nextPage() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_PAGE_DOWN); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_PAGE_DOWN); } public void openNewFile() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_N); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_N); } public void openHere() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_O); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_O); }
  • 88. Bengali Speech Recognition 88 public void close() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_ALT); robot.keyPress(KeyEvent.VK_F4); robot.keyRelease(KeyEvent.VK_ALT); robot.keyRelease(KeyEvent.VK_F4); } public void startMenu() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_WINDOWS); robot.keyRelease(KeyEvent.VK_WINDOWS); } public void refresh() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_F5); robot.keyRelease(KeyEvent.VK_F5); } public void help() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_F1); robot.keyRelease(KeyEvent.VK_F1); } public void showDesktop() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_WINDOWS); robot.keyPress(KeyEvent.VK_D); robot.keyRelease(KeyEvent.VK_WINDOWS); robot.keyRelease(KeyEvent.VK_D); } public void openMyComputer() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_WINDOWS); robot.keyPress(KeyEvent.VK_E); robot.keyRelease(KeyEvent.VK_WINDOWS); robot.keyRelease(KeyEvent.VK_E); } // sokrio public void enter() throws AWTException{ Robot robot = new Robot();
  • 89. Bengali Speech Recognition 89 robot.keyPress(KeyEvent.VK_ENTER); robot.keyRelease(KeyEvent.VK_ENTER); } // porer window | ager window public void altTab() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_ALT); robot.keyPress(KeyEvent.VK_TAB); robot.keyRelease(KeyEvent.VK_ALT); robot.keyRelease(KeyEvent.VK_TAB); } // porer tab | ager tab public void ctlTab() throws AWTException{ Robot robot = new Robot(); robot.keyPress(KeyEvent.VK_CONTROL); robot.keyPress(KeyEvent.VK_TAB); robot.keyRelease(KeyEvent.VK_CONTROL); robot.keyRelease(KeyEvent.VK_TAB); } public void openNotepad() throws AWTException, IOException{ ProcessBuilder proc=new ProcessBuilder("notepad.exe"); Process p=proc.start(); } public void openBrowser() throws AWTException, IOException{ String theUrl = "http://www.google.com"; Runtime.getRuntime().exec ("rundll32 url.dll,FileProtocolHandler " + theUrl); } public void openFacebook() throws AWTException, IOException{ String theUrl = "http://www.facebook.com"; Runtime.getRuntime().exec ("rundll32 url.dll,FileProtocolHandler " + theUrl); } public void openYahoo() throws AWTException, IOException{ String theUrl = "http://www.yahoo.com"; Runtime.getRuntime().exec ("rundll32 url.dll,FileProtocolHandler " + theUrl); }
  • 90. Bengali Speech Recognition 90 public void openTechtunes() throws AWTException, IOException{ String theUrl = "http://www.techtunes.com.bd"; Runtime.getRuntime().exec ("rundll32 url.dll,FileProtocolHandler " + theUrl); } public void openProthomAlo() throws AWTException, IOException{ String theUrl = "http://www.prothom-alo.com"; Runtime.getRuntime().exec ("rundll32 url.dll,FileProtocolHandler " + theUrl); } } SBSBSR.JAVA package SBS.BSR.CMD.APP.S12; import java.awt.AWTException; import java.awt.Robot; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.IOException; import org.omg.CORBA.portable.InputStream; import org.omg.CORBA.portable.OutputStream; import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; public class SBSBSR { public static void main(String[] args) throws IOException, AWTException { ConfigurationManager cm; if (args.length > 0) { cm = new ConfigurationManager(args[0]); } else { cm = new ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml"));
  • 91. Bengali Speech Recognition 91 } // allocate the recognizer System.out.println("Loading..."); Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); recognizer.allocate(); // start the microphone or exit the program if this is not possible Microphone microphone = (Microphone) cm.lookup("microphone"); if (!microphone.startRecording()) { System.out.println("Cannot start microphone."); recognizer.deallocate(); System.exit(1); } /*Robot robot = new Robot(); robot.delay(2000); giveCommand("bangla command"); robot.delay(3000);*/ printInstructions(); //giveCommand(); // loop the recognition until the programm exits. String comString = " "; System.out.println("comString: " + comString + " length : "+comString.length()+'n'); while (true) { System.out.println("Start speaking. Press Ctrl-C to quit.n"); Result result = recognizer.recognize(); if (result != null) { String resultText = result.getBestResultNoFiller(); System.out.println("You said: " + resultText +'n'); giveCommand(resultText); //CommandActivator obj = new CommandActivator(); //obj.openMyComputer(); } else { System.out.println("I can't hear what you said.n"); } } } /** Prints out what to say for this demo. */ private static void printInstructions() { System.out.println("Sample sentences:n" + " n" );
  • 92. Bengali Speech Recognition 92 } private static void giveCommand(String CompareText) throws AWTException, IOException{ if(CompareText.equals(" ")){ CommandActivator obj = new CommandActivator(); obj.rightClick(); } } } ------------------------------------------------------------------------------------------------------------ -------------------------------------------------------------------------------------