Thesis Paper of my Bachelor Degree

BENGALI
SPEECH
RECOGNITION
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LEADING UNIVERSITY, SYLHET
1st January
2013

Bengali Speech Recognition
2
BENGALI SPEECH RECOGNITION
1st JANUARY, 2013
This Project report is submitted to the Department of Computer Science
and Engineering, Leading University, for the partial fulfillment for the
requirements of the degree of Bachelor of Science in Computer Science
and Engineering.
Supervised By
Mrs. Arpita Chakraborty
Assistant Professor
Department of Computer Science and Engineering
Leading University, Sylhet
&
Mrinal Kanti Dhar
Lecturer
Department of Electrical & Electronic Engineering
Leading University, Sylhet
Conducted By
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LEADING UNIVERSITY, SYLHET, BANGLADESH
Shimul Dey
B.Sc (Hon’s) Final Semester
Examination-2013
ID: 0901020032
Session: 2009-2013
Sanjoy Ranjan Das
Examination-2013
ID: 0901020016
Session: 2009-2013
Md. Badrul Alom Chowdhury
Examination-2013
ID: 0901020004
Session: 2009-2013

3
To
The Head
Department of Computer Science and Engineering
Leading University, Sylhet, Bangladesh.
Sub: Proposal for Project.
Respected Sir,
We would like to inform you that, we are the student of your department would like to carryout a
project on “BENGALI SPEECH RECOGNITION”.
We would be grateful to you if you kindly allow us to proceed to complete the project on the
above mention topics under condition of partial fulfillment of the requirements for the degree of
Bachelor of Science in Computer Science and Engineering.
Thanking you.
Yours Sincerely
Name ID
Md. Badrul Alom Chowdhury 0901020004
Sanjoy Ranjan Das 0901020016
Shimul Dey 0901020032

4
DECLARATION
We hereby declare that the project work entitled “Bengali Speech Recognition” submitted
to the Leading University, is a record of an original work done by us under the guidance of
Arpita Chakraborty, Assistant professor in Department of Computer Science and Engineering,
Leading University and this project work is submitted in the fulfillment of Bachelor in Computer
Science & Engineering. The result of this project has not been submitted to any other University
or Institute for the award of any degree or diploma. Materials of work found by other researcher
are mentioned by reference.
Signature of Spervisor & Co-supervisor
Name of Supervisor Signature
Mrs. Arpita Chakraborty
Assistant Professor
Name of Co-supervisor Signature
Mrinal Kanti Dhar
Lecturer
Signature of Authors
Name of Authors Signature
Md. Badrul Alom Chowdhury
Sanjoy Ranjan Das
Shimul Dey

5
ACKNOWLEDGEMENT
We would like to thank our Honorable Supervisor Arpita Chakraborty & Co-supervisor
Mrinal Kanti Dhar for their guidance throughout the process. They exposed us to the real
professional research world with their precious experience. We really cherish for the time
working with them on such an interesting topic. Also we would like to thank our university
students to let us record their voice for experiments and our Computer Science & Engineering
Department for giving us authority and facility to complete the project. Last but not at least,
thanks to the Almighty for helping us in every steps of this project work.
.

6
Table of Contents
Declaration...................................................................................................................................4
Acknowledgments........................................................................................................................5
List of figures...............................................................................................................................8
List of Chart.................................................................................................................................9
List of Table...............................................................................................................................10
List of Abbreviation & Symbols................................................................................................10
Abstract......................................................................................................................................11
Literature Survey .......................................................................................................................12
Chapter 1: Introduction ............................................................................(13-21)
1.1 Introduction ..........................................................................................................14
1.2 History of Speech Recognition ..................................................................... 14-15
1.3 Types of Speech Recognition ...........................................................................15
1.3.1 Isolated Words ......................................................................................15
1.3.2 Connected Words ...............................................................................16
1.3.3 Continuous Words ................................................................................16
1.3.4 Spontaneous Words .............................................................................16
1.3.5 Speaker Dependent ...............................................................................16
1.3.6 Speaker independent .............................................................................16
1.3.7 Overview of Speech Recognition System .............................................17
1.4 Terms and Concepts..........................................................................................17
1.4.1 Utterance ................................................................................................17
1.4.2 Pronunciation ................................................................................... 17-18
1.4.3 Grammars ...............................................................................................18

7
1.4.4 Vocabularies ..........................................................................................18
1.4.5 Training ..................................................................................................18
1.4.6 Accuracy ................................................................................................18
1.4.7 Language Dictionary ...........................................................................18
1.4.8 Filler Dictionary ..................................................................................19
1.4.9 Phone ...................................................................................................19
1.4.10 HMM ........................................................................................... 19-20
1.4.11 Language Model ...............................................................................20
1.5 Overview of the Full system......................................................................................21
Chapter 2: METHODOLOGY ................................................................(22-32)
2.1 Data Preparation................................................................................................23
2.1.1 Corpus....................................................................................................23
2.1.2 Audio Files....................................................................................... 23-24
2.1.3 Dictionary Files................................................................................ 24-25
2.1.4 Phone File ......................................................................................... 25-26
2.1.5 Language Model File .lm Format ...........................................................26
2.1.6 Language Model File .DMP Format................................................. 26-27
2.1.7 Transcription File....................................................................................27
2.1.8 Fileids File ........................................................................................ 27-28
2.1.9 Filler File.................................................................................................28
2.2 Setting up The System Environment ................................................................28
2.2.1 Software Requirements...........................................................................28
2.2.2 Trainer Setup...........................................................................................28
2.2.3 Project Folder Setup.......................................................................... 39-30

8
2.2.4 Training the Acoustic Model ..................................................................30
2.2.5 Testing Part .............................................................................................30
2.2.5.1 Testing with Pocket Sphinx ....................................................... 30-31
2.2.5.2 Testing with Sphinx4................................................................. 31-32
Chapter 3 TESTING AND PERFORMANCE EVALUATION ..........(33-38)
3.1 Testing & Performance Evaluation....................................................................34
3.2 Test Results with Pocket Sphinx........................................................................35
3.3 Test Results with Sphinx4 ................................................................................36
3.3.1 Input Type Microphone ..................................................................37
3.3.2 Input Type Audio............................................................................38
Chapter 4: Applications & Developing .......................................................(40-42)
4.1 Review of Some Developed Recognized Application..........................................41
4.1.1 Dictation Application...................................................................................41
4.1.2 Phonetic Translator......................................................................................41
4.1.3 Training File Creator.............................................................................. 41-42
4.1.4 Training File Creator....................................................................................42
Chapter 5: Limitation & Future Work ......................................................(43-44)
5.1 Limitation .............................................................................................................44
5.2 Future Work .........................................................................................................44
Chapter 6: CONCLUSION & REFERENCES ........................................(45-47)
6.1 Conclusion............................................................................................................46
6.2 References.............................................................................................................47

9
List of Figures
List of Charts
Fig No. Name of figures Page
No.
1.3.7 Overview of Speech Recognition System 17
1.4.10 Applying Hidden Markov Model on Speech Recognition. 20
1.5 Overview of the full System Model 21
2.1.2 Audio File Recording Format 24
2.2.5.1 Testing with Pocket Sphinx 31
2.2.5.2 Testing with Sphinx4 32
4.1.2 Dictionary files with phonetic translation. 41
4.1.3.1 Fileids files with phonetic translation. 42
4.1.4.2 Transcription File. 42
Fig No. Name of Charts Page
No.
3.2.2 Experiment Results with Pocket Sphinx 35
3.3.1.2 Experimental Details with Results for Sphinx 4 Live 37
3.3.2.2 Experimental Details with Results for Sphinx 4 Audio 39

10
List of Table
No. of
table
Name of tables Page
No.
1.2 History of Speech Recognition 15
2.2.3 Configuration of Sphinx-train.cfg 29-30
3.2.1 Experimental details with Results for Pocket Sphinx 34
3.3.1.1 Test results with Sphinx4 Input Type: Microphone 36
3.3.2.1 Test results with Sphinx4 Input Type: Audio 38
7.1 Speaker Profiles 48
7.2 Unicode to IPA Chart 49-63
7.3 Corpus About University Admission Information. 64-70
List of Abbreviation & symbols:
ASR Automatic Speech recognition
BSD Berkeley Software Distribution
CMU Carnegie Mellon University
HMM Hidden Markov Model
IPA International Phonetic Alphabet
CMU Principal Component Analysis
ASCII American Standard Code for Information Interchange
MERL Mitsubishi Electric Research Labs
CRBLP Center for Research Bangla Language Processing
D2P Dictionary to pronunciation
SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition
IDE Integrated Development Engine
ABI Allied Business Intelligence

11
ABASTRACT
This report presents an overview of Automatic Speech Recognition (ASR) for our mother
tongue Bangla. It begins with an introduction to speech recognition technology and then it
explains how such systems work and the level of accuracy that can be expected. The object of
human speech is not just a way to convey words from one person to another but also to make the
other person to understand the depth of the spoken words. These systems have made dramatic
performance leaps in the recent past. The aim of this project is to develop software that identifies
human speech with the help of CMU sphinx Speech Recognition API.

12
Literature Survey
Today speech technology plays an important role in many applications. Speech
technology has moved from research to commercial application. Many human machine
interfaces have been invented and applied today in telephone food ordering system, airport
information system, ticketing system, restaurant reservation system, etc. As a result, we have
selected this important field for our project. On the other hand, most of the languages have a
speech recognition system but our mother tongue Bangla has no proper speech recognition
system this is the main reasons to select this topics. At the starting era most of the research
works are done by using Artificial Neural Network (ANN), but as we are using HMM
based technique so some HMM based and related research are mentioned below.
Implementation of Speech Recognition System for Bangla (Shammur Absar
Chowdhury-August 2010). We have studied this thesis report within one week and acquire lot of
knowledge about Speech Recognition. We are really very thankful to Shammur Absar
Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students
who want to work in these fields. [9]
Speech Recognition by Machine: A Review (M.A.Anusuya and S.K.Katti Department
of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore,
India) from this review we have learn lot of things about the types of Speech Recognition,
approaches of speech recognition etc. [1]
Isolated and Continuous Bangla Speech Recognition: Implementation,
Performance and application perspective (by Md. Abul Hasnat, Jabir Mowla and Mumit
Khan- BRAC) – We have studied the past works and to the best of us knowledge this work is the
first reported attempt to recognized Bangla speech using HMM Technique, so from this
publication we have taken most of us suggestion about the steps to build Speech
Recognition System for our report. From here we have learned how to increase the quality
of audio signal given as input by noise elimination process and end detection algorithm,
from this paper we have also learned that how feature of a sound is extracted and what are the
parameters taken in feature files, we have also learn the algorithm for creating HMM models. [8]
Bengali segmented automated speech recognition (Department of Computer Science
and Engineering, BRAC University) from this thesis report we have learn about the Vowel and
Consonants phonemes, Vowels and Consonants phoneme clusters, Voiced and non-voiced stops
and Hidden Markov Model.[6]
Recognition of Spoken Letters in Bangla (Abul Hasanat,Md. Rezaul Karim,Md.
Shahidur Rahman and Md. Zafar Iqbal - SUST), Extraction of Bangla Vowel and Representation
in the Vowel Space (Syed Akhter Hossain-East West, M Lutfar Rahman-Du and Farruk Ahmed-
NSU), Acoustic Analysis of Bangla Consonants(Firoj Alam , S. M. Murtoza Habib and Mumit
Khan) - From here We have learn the technique used to recognize letters, vowels and consonant,
basically here we found out the basic steps towards a recognizer and what are the
common steps to build a full functioning recognizer.[7]

13
Chapter 1
INTRODUCTION
 INTRODUCTION
 HISTORY OF SPEECH RECPGNITION
 TYPES OF SPEECH RECPGNITION
 TERMS AND CONCEPTS

14
1.1 Introduction
Automatic Speech Recognition (ASR) in terms of machinery is the process of converting
an acoustic signal, captured by a microphone or a telephone, to a set of words. It is a broad term
which means it can recognize almost anybody’s speech and also known as automatic speech
recognition or computer speech recognition which means understanding voice by the computer
and performing any required task. On the other hand, Speech Recognition Simply is the
process of converting spoken input to text. Speech recognition is thus sometimes referred to
as speech-to-text. Speech recognition, also referred to as voice recognition, is software
technology that lets the user control computer functions and dictate text by voice. For
example, a person can move the cursor with a voice command, such as “mouse up”. We can
control application functions, such as opening a file menu and we can create a document, such as
letters or reports or start media player by saying “Music”. For this reason many scientists and
researchers are busy with doing works on speech recognition. Most of the languages in the world
have speech recognizers of its own. But our mother tongue Bengali is not enriched with a speech
recognizers. Small research works have been carried on Bengali speech recognizer, but it really
does not have a great outcome. Implementing continuous speech recognizer for Bengali is our
main goal throughout the project work. But developing full blown continious speech recognizer
is a huge task within a short span of time. As a result we have selected a domain based
continuous speech recognizer which includes a conversation on university admission process.
Throughout the whole period of work, we tried to learn about different tools and we chose to use
CMU Sphinx4 as speech recognition API because it’s open source software and it has high
accuracy. There are many high quality and widely used software are available for this work. But
these types of software are so costly and need Berkeley Software Distribution (BSD) license.
The ultimate goal of ASR research is to allow a computer to recognize in real-time, with 100%
accuracy, all words that are intelligibly spoken by any person, independent of vocabulary size,
noise, speaker characteristics or accent. [9]
1.2 History of Speech Recognition
While AT&T Bell Laboratories developed a primitive device that could recognize speech
in the 1940s, researchers knew that the widespread use of speech recognition would depend on
the ability to accurately and consistently perceive subtle and complex verbal input. Thus, in the
1960s, researchers turned their focus towards a series of smaller goals that would aid in
developing the larger speech recognition system. As a first step, developers created a device that
would use discrete speech, verbal stimuli punctuated by small pauses. However, in the 1970s,
continuous speech recognition, which does not require the user to pause between words, began.
This technology became functional during the 1980s and is still being developed and refined
today. Speech Recognition Systems have become so advanced and mainstream that business and
health care professionals are turning to speech recognition solutions for everything from
providing telephone support to writing medical reports. Technological advances have made

15
speech recognition software and devices more functional and user friendly, with most
contemporary products performing tasks with over 90 percent accuracy.
According to the figure provided by industry, satisfying the needs of consumers and
businesses by simplifying customer interaction, increasing efficiency, and reducing operating
costs, speech recognition is used in a wide range of applications. Furthermore, Allied Business
Intelligence (ABI), the increased popularity of speech recognition will push revenues from $677
million in 2002 to an estimated $5.3 Billion by 2008. Indeed, recent advances in speech
recognition software are creating a dynamic environment, since this technology appeals to
anyone who needs or wants a hands-free approach to computing tasks. As the merger of large
vocabularies and continuous recognition continues, look for more and more companies to move
toward speech recognition and watch the industry take its place as a leader in the technology
sector. [1]
1936 AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder.
1970 HMM approach to speech & voice recognition was invented by Lenny Baum of
Princeton University.
1971 DARPA established.
1982 Dragon Systems was founded.
1984 Speech Works, the leading provider of over-the-telephone automated speech
recognition (ASR) solutions, was founded.
1995 Dragon released discrete word dictation-level speech recognition software. It was the
first time dictation speech & voice recognition technology was available to consumers.
1997 Dragon introduced "Naturally Speaking", the first "continuous speech" dictation
software available.
1998 Microsoft invested $45 million to allow Microsoft to use speech & voice recognition
technology in their systems.
2000 Lernout & Hauspie acquired Dragon Systems for approximately $460 million.
2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical, Lowers Healthcare Costs
through Highly Accurate Speech Recognition.
Table 1.2: History of Speech Recognition
1.3 Types of Speech Recognition
Speech recognition systems can be separated in different classes by describing
what types of utterances they have the ability to recognize. These classes are classified as
the following: [1]
1.3.1 Isolated Words: Isolated word recognizers usually require each utterance to
have quiet (lack of an audio signal) on both sides of the sample window. It accepts single
words or single utterance at a time. These systems have "Listen/Not-Listen" states, where they
require the speaker to wait between utterances (usually doing processing during the pauses).

16
Isolated Utterance might be a better name for this class. Simply Isolated Words are the single
words such as me, You, Go etc.
1.3.2 Connected Words: Connected word systems (or more correctly 'connected
utterances') are similar to isolated words, but allows separate utterances to be 'run-together'
with a minimal pause between them. Such as- I eat rice.
1.3.3 Continuous Speech: Continuous speech recognizers allow users to speak almost
naturally, while the computer determines the content. (Basically, it's computer
dictation). Recognizers with continuous speech capabilities are some of the most
difficult to create because they utilize special methods to determine utterance boundaries.
1.3.4 Spontaneous Speech: At a basic level, it can be thought of as speech that is natural
sounding and not rehearsed. An ASR system with spontaneous speech ability should be able
to handle a variety of natural speech features such as words being run together, "ums" and
"ahs", and even slight stutters.
Based on speaker there are two type of speech recognition. Those are
1. Speaker–dependent
2. Speaker–independent
1.3.5 Speaker–dependent: Speech recognition systems that require a user to train the
system to his/her voice are known as speaker-dependent systems. If you are familiar with
desktop dictation systems, most are speaker dependent like IBM via Voice. Because they
operate on very large vocabularies, dictation systems perform much better when the
speaker has spent the time to train the system to his/her voice. Speaker–dependent software
is commonly used for dictation. It works by learning the unique characteristics of a single
person's voice, in a way similar to voice recognition. New users must first "train" the software by
speaking to it, so the computer can analyze how the person talks. This often means users have to
read a few pages of text to the computer before they can use the speech recognition software.
1.3.6 Speaker–independent: Speech recognition systems that do not require a user to
train the system are known as speaker-independent systems. Speech recognition in the Voice
XML word must be speaker-independent. Speaker–independent software is more commonly
found in telephone applications. It is designed to recognize anyone's voice, so no training is
involved. This means it is the only real option for applications such as interactive voice response
systems where businesses can't ask callers to read pages of text before using the system. The
downside is that speaker–independent software is generally less accurate than speaker–dependent
software.

17
1.3.7 Overview of Speech Recognition System
Fig 1.3.7: Overview of Speech Recognition System
1.4 Terms and Concepts
Following are the some basic terms and concepts that are fundamental to speech
recognition. It is important to have a good understanding of these concepts. [9][10]
1.4.1 Utterances
An utterance is something you say. It can be one word or it can be a series of words. For
example, “Word”, “Microsoft Word,” or “I’d like to run Microsoft Word” are all examples of
possible utterances. On the other hands, an utterance is any stream of speech between two
periods of silence. Utterances are sent to the speech engine to be processed. Silence, in speech
recognition, is almost as important as what is spoken, because silence delineates the start and end
of an utterance. The speech recognition engine is "listening" for speech input. When the engine
detects audio input in other words, a lack of silence the beginning of an utterance is signaled.
Similarly, when the engine detects a certain amount of silence following the audio, the end of the
utterance occurs.
1.4.2 Pronunciation
You have heard the word pronunciation when it pertains to learning any language. What is
pronunciation and what are some of the fundamental aspects of this important part of learning
English. In any language pronunciation pertains to the sounds that are produced to make
meaning. There are aspects of speech that go beyond that individual sound that makes the
language unique: phrasing, stress, intonation, timing, and rhythm. Your voice is then projected to
communicate what you want to say. Add to that cultural nuances, gestures and local expressions
and you speak that immediately tells something about yourself to the people around you. When
you are just learning a new language, it would be easy to avoid speaking in public, but that is not
the best choice because you do not want to experience social isolation. It does not seem fair but
people can be judged by the way they speak and can be seen as uneducated, incompetent or lack
knowledge. All because the listener is reacting to the pronunciation and not what you are trying
to communicate. The speech recognition engine uses all sorts of data, statistical models, and
algorithms to convert spoken input into text. One piece of information that the speech

18
recognition engine uses to process a word is its pronunciation, which represents what the speech
engine thinks a word should sound like. Words can have multiple pronunciations associated with
them. For example, the word “the” has at least two pronunciations in the U.S. English language:
“thee” and “thuh”.
1.4.3 Grammars
Grammars define the domain, or context, within which the recognition engine works. The
engine compares the current utterance against the words and phrases in the active
grammars. If the user says something that is not in the grammar, the speech engine will not be
able to understand it correctly. So usually speech engines have a very vast grammar.
Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the
Speech Recognition system. Generally, smaller vocabularies are easier for a computer to
recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each
entry doesn't have to be a single word.
1.4.4 Vocabularies
Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR
system. Generally, smaller vocabularies are easier for a computer to recognize, while larger
vocabularies are more difficult. Unlike normal dictionaries, each entry doesn't have to be a single
word. They can be as long as a sentence or two. Smaller vocabularies can have as few as 1 or 2
recognized utterances (e.g. “Wake Up"), while very large vocabularies can have a hundred
thousand or more.
1.4.5 Training
Some speech recognizers have the ability to adapt to a speaker. When the system has this ability,
it may allow training to take place. An ASR (Automatic Speech Recognition) system is trained
by having the speaker repeat standard or common phrases and adjusting its comparison
algorithms to match that particular speaker. Training a recognizer usually improves its accuracy.
Training can also be used by speakers that have difficulty speaking, or pronouncing
certain words. As long as the speaker can consistently repeat an utterance, ASR systems with
training should be able to adapt.
1.4.6 Accuracy
The ability of a recognizer can be examined by measuring its accuracy − or how well it
recognizes utterances. The performance of a speech recognition system is measurable. Perhaps
the most widely used measurement is accuracy. It is typically a quantitative measurement and
can be calculated in several ways. This measurement is useful in validating application design.
For example, if the user said "yes," the engine returned "yes," and the "YES" action was
executed, it is clear that the desired result was achieved. But what happens if the engine
returns text that does not exactly match the utterance? For example, what if the user
said "nope," the engine returned "no," yet the "NO" action was executed? Should that be
considered a successful dialog? The answer to that question is yes because the desired result was
achieved.
1.4.7 A Language Dictionary
Accepted Words in the Language are mapped to sequences of sound units representing
pronunciation, sometimes includes syllabification and stress.

19
1.4.8 A Filler Dictionary
Non-Speech sounds are mapped to corresponding non-speech or speech like sound units.
1.4.9 Phone
Way of representing the pronunciation of words in terms of sound units. The standard system for
representing phones is the International Phonetic Alphabet or IPA. English Language use
transcription system that uses ASCII letters whereas Bangla uses Unicode letters
1.4.10 HMM
Hidden Markov Models can be seem as finite state machines where for each sequence unit
observation there is a state transition and, for each state, there is a output symbol
emission. Transitions among the states are governed by a set of probabilities called transition
probabilities. In a particular state an outcome or observation can be generated, according to the
associated probability distribution. It is only the outcome, not the state visible to an external
observer and therefore states are ``hidden'' to the outside; hence the name Hidden Markov
Model.
On the other hand, Hidden Markov Model (HMM) is a statistical model in which the system
being modeled assumed to be a Markov process with unknown parameters, and the challenge is
to determine the hidden parameters from an observation parameters. In speech recognition
process, after our voice is recorded, it will be divided into many frames that we need to process
in order to generate the sentence in text form. Each frame is represented as state, group of some
states is represented as phoneme, and group of some phonemes is represented as word that we
need to recognize. In database known as linguist model, we store the reference value of state,
phoneme, and word in order to compare with the observed data (voice).By applying HMM; we
construct a statistical model on each phone that its states are assigned specific possibilities in
comparison with reference value. The possibility of each state depends on itself and the previous
one. The goal of speech recognition system is to find out the sequence of states that has the
maximum probability. Because the HMM theory is very complicated, so we don’t go very detail
about that. If you want to learn more, you can see at the Appendix A

20
Fig 1.4.10: Applying Hidden Markov Model on Speech Recognition.
1.4.11 Language Model
The language model describes the likelihood, probability, or penalty taken when a sequence or
collection of words is seen. A language model is used to restrict word search. It defines which
word could follow previously recognized words and helps to significantly restrict the matching
process by stripping words that are not probable. Most common language models used are n-
gram language models-these contain statistics of word sequences-and finite state language
models-these define speech sequences by finite state automation, sometimes with weights.

21
1.5 Overview of the Full System
Figure 1.5 Overview of the Full System Model

22
Chapter 2
METHODOLOGY
 Data Preparation
 Setting up the System Environment

23
2.1 Data Preparation
We have to make some important files that are required for training and also for testing.
We have already mentioned that our project is about domain based recognition application.
Domain based means a particular topic containing small amount of data. We have selected fifty
sentences for our recognition application and files in below are created based on these data. The
required files are
 Corpus
 Audio files
 Dictionary file
 Phone file
 Language Model file .lm format
 Language Model file .DMP format
 Transcription file
 Fileids file
 Filler file
2.1.1 Corpus
The Corpus is just a list of sentences that use to train the language model and simply we can tell
Corpus is the collection of sentences those we are want to recognize in our machine. For our
project we have also collected some important sentences according to our domain. Some
sentences of our project are following…
….
2.1.2 Audio files
After collecting the corpus next step is to collect the audio file of this corpus with the (.wav) or
(.sph) format. During recording session the following parameters of the wave file has been
maintained throughout:
• Sampling rate of the audio: 16 kHz
• Bit rate (bits per sample): 16
• Channel: mono (single channel)

24
Fig 2.1.2: Audio File Recording Format
For this work 16 kHz sample rate has been chosen because it provides more accurate high
frequency information and 16 bit per sample will divides the element position in to 65536
possible values. After the recording, the splitting of the audio files per sentence has been done
manually using recording software for our project we are using WavePad sound editor and sound
file in a .wav format, where each wav file has been named by using speaker id and sentence id.
For example: An audio file of our project is
01_01.wav stands as
Speaker Id: 01 Sentence Id: 01
When we have collected the audio from a speaker, we have saved the personal information of
this speaker like:
 Name
 Age
 Gender
 Audio collected environment
Some other information like
 Environmental condition of recording (for example: class room condition, number of
students present, sources of noise like fan, generator’s sound etc.)
 Technical details of device (pc, microphone)
 Date and time of recording has also been noted down.
2.1.3 Dictionary file
Simply dictionary file is the list of words which we get from our corpus file and then we need to
find the pronunciation of those words such as -AA P NA KE. For this work we need
software which gives us dictionary file to pronunciation file. Also a software grapheme to
phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA
system but we need ASCII format. As a result for our project we have developed a software D2P
(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file. Our

25
dictionary file contains 128 words. The format of dictionary file will be (.dic). The name of our
dictionary file is sbsbsr.dic and some contents of dictionary file for our project is:
AA CH E
AA P N AA K E
AA P N I
AA M I
I CCHA U K
….
Note:
 All phonemes are in capital letter such as = AA P N I
 File format is (.dic)
 File encoding is utf_8 without BOM
 Word can not be repeated
 A blank line is required in the end of file (i.e. an extra line)
2.1.4 Phone file
Phone file is the list of phoneme within words such as “AA P N I”. Here is 4 phonemes and it is
a simple text file that tells a trainer what phonemes are part of the training set. The file has one
phone in each line, no duplicity is allowed. This file can be generated using a small program
written for this project which takes the *.dic file as input and gives *.phone file as output.
For our project the file name is sbsbsr.phone. Some contents of phone file for our project is
A
AA
B
BH
C
CCHA
CH
….
Note:
 All phones are in capital letter.
 File format is .phone
 Word can not be repeated
 Silence phoneme “SIL” also included in phone file
 All phoneme in dic file are present in phone file without repetition

26
2.1.5 Language Model file .lm format
A language model assigns a probability to a piece of unseen text, based on some training data.
The language model file is plain text. The format is the commonly used "arpa" format which is
standard in speech recognition research. It lists 1-, 2- and 3-grams along with their likelihood
(the first field) and a back-off factor (the third field). To build this file, CMU Imtoolkit is used.
Imtool is a web based tool that allows users to quickly compile text-based components needed
for using an ASR decoder. To do this, a corpus is needed, which in this case means a set
of sentences (or more precisely, utterances) that is expected for recognition system to be able
to handle.The corpus needs to be in the form of an ASCII text file but with new advanced
version Unicode text file is also supported, with one sentence to a line. Upload this file, click the
compile button. This will give a set of lexical (pronunciation dictionary) and language
modeling files. Here the only file used is LM file as Pronunciation dictionary should be built as
stated above. The tool is best for small domains. For our project the file name is sbsbsr.lm and
file format is .lm. Some contents of language model file for our project is
-2.0719 -0.2626
-2.0719 -0.2861
-2.0719 -0.2973
-1.7709 -0.2936
-2.0719 -0.2626
-1.7709 এ -0.2861
-2.0719 -0.2626
-2.0719 ও -0.2973
….
2.1.6 Language Model file .DMP format
We also need Language Model file .DMP format for training in sphinx4. We are using Linux
environment for getting the Language Model file .DMP format from the Language Model file
.lm format. We have used following commands in Linux terminal for getting the Language
Model file with .DMP format:
sphinx_lm_convert -i model.lm -o model.DMP
Here model.lm is the name of language model file with .lm format and model.dmp is the name of
language model file with .DMP format. For our project it is
sphinx_lm_convert -i sbsbsr.lm -o sbsbsr.DMP

27
2.1.7 Transcription file
A transcript is needed to represent what the speakers are saying in the audio file. So in a file the
dialogue of the speaker noted exactly the same precise way it has been recorded, with
silence tag (starting tag <s> ending tag </s>) followed by the file ID which represent the
utterance. This file is known as transcription file and basically there are two types of
.transcription file. One of them is used to train the system and another is to testing. Named the
files using the project name, here the name of the project is “sbsbsr”, so the train file name is
sbsbsr_train.transcription and test file name is sbsbsr_test.transcription. Some contents of
transcription file for our project is:
<s> </s> (01_01)
<s> </s> (01_02)
<s> </s> (01_03)
<s> </s> (01_04)
….
Note:
 File format is .transcription
 Sentence can not be repeated
2.1.8 Fileids file
The Fileids files contain the name of all audio file without .wav or .sph extension. Two types of
Fileids file, one for training and other for testing. The name of training file for our project is
sbsbsr_train.fileids and the name of testing file for our project is sbsbsr_test.fileids.
For Example:
sbsbsr_train/sanjoy/sanjoy1/01_01
sbsbsr_ train /sanjoy/sanjoy1/01_02
sbsbsr_ train /sanjoy/sanjoy1/01_03
…….
2.1.9 Filler file
Filler file contains user’s definition of any background noise emerging in recording
database and it is dictionary where a non-speech sounds are mapped to corresponding non
speech sound units. This file is named as sbsbsr.filler for our project.
For Example:

28
<s> SIL
<sil> SIL
</s> SIL
Note that the words <s>, </s> and <sil> are treated as special words and are required to be
present in the filler dictionary. At least one of these must be mapped on to a phone called "SIL".
 <s> symbolizes “beginning of speech”
 </s> symbolizes “end of speech”
 <sil> symbolizes “silence in speech”
2.2 Setting up the System Environment
2.2.1 Software Requirements
We did the training part in Linux operating system. For training the recognition engine from
CMU sphinx we need two software − sphinx base and sphinx train. We have collected it from
CMU sphinx web site. For installing these twos oftware first we need to install some dependence
software in Ubuntu distribution of Linux such as Perl and C compiler (gcc). [14]
We installed these two softwares by the following commands in Linux terminal:
Perl
sudo apt-get install perl
GCC
sudo apt-get install gcc
2.2.2 Trainer Setup
We already know that for setting up the trainer we need two software sphinx base and
sphinx train. After downloading the software we have been decompressed it in a folder in Linux,
it can be any folder. We did it in Linux root folder. After decompressing these software’s we can
install them by the following commands in terminal [14]:
Sphinxbase
cd sphinxbase
sudo ./configure
sudo make
sudo make install
Sphinxtrain
cd Sphinxtrain
sudo ./configure

29
sudo make
sudo make install
2.2.3 Project Folder Setup
We have created the system environment for training. We have created a project folder
where sphinx train will create the trained files or acoustic model. First we need to enter to the
root directory where our installed sphinx base and sphinx train folder are placed. Here we have
created a folder. We gave the folder name is “sbsbsr”. After creating the folder we need to open
terminal and go to created folder sbsbsr from terminal. Then we have created a project task for
sphinx train by the following command in terminal:
../sphinxtrain/scripts_pl/setup_SphinxTrain.pl -task sbsbsr
Executing this command from terminal will create various folder in sbsbsr such as “etc”,
“wav”, “model parameters” etc.
Now the time to copy files those we have created in data preparation part. We have to
copy dic, filler, phone, transcript, fileids, lm, lm.dmp in “etc” folder and our collected audio into
“wav” folder. Now we need to change some parameters for training in sphinx_train.cfg file
created automatically in “etc” folder when creating the project task. We have changed some
parameters written below in sphinx_train.cfg file:
Parameter Value Before Value After
$CFG_WAVFILE_EXTENSION sph wav
$CFG_WAVFILE_TYPE nist/mswav/raw raw
$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd
$CFG_FINAL_NUM_DENSITIES 8 1
$CFG_STATESPERHMM 6 3
$CFG_N_TIED_STATES 100 100
Table 2.2.3: Configuration of Sphinx-train.cfg

30
By these changes we have finished the project folder setup.
2.2.4 Training the Acoustic Model
In training process first we have to convert our collected raw speech audio data into mfc files.
That’s why again we opened project directory in Linux terminal and ran the feature extraction
command in terminal. Command we executed for this task is [14]:
perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_train.fileids
Executing this command made all *.wav files into *.mfc files in “feat” directory under project
folder “sbsbsr”.
Now we are ready to execute the main training command in Linux terminal that will create the
acoustic model. For this task still we have to stay in project directory in terminal and execute the
following command in terminal:
perl scripts_pl/RunAll.pl
By executing this command will train the acoustic model and we will find the trained acoustic
model. The model files are placed in “model_parameters/” directory under project folder
“sbsbsr”.
2.2.5 Testing part
We tested our model by two recognizers from CMU sphinx. They are Pocketsphinx and Sphinx4.
Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable, modifiable recognizer.
2.2.5.1Testing with Pocketsphinx
First we have to download this tool from CMU Sphinx site. After downloading this tool we will
go to the root directory where we have installed sphinxbase and sphinxtrain. Here we extract the
downloaded pocket sphinx file. After extracting, we install this software by these following
commands in Linux terminal:
./configure
make
After installing the software we have to go into our project folder and execute a command from
terminal to make folder structure for training. For this task the command is
../pocketsphinx/scripts/setup_sphinx.pl -task sbsbsr
Then we put some testing audio in “wav” directory, because pocketsphinx recognize with input
as audio file. Also we have to copy test fileids and transcript file in “etc” folder. For decoding or
testing our model from audio with pocket sphinx, first we need feature files of audio files. We
can make this by the following command in Linux terminal:

31
perl scripts_pl/make_feats.pl -ctl etc/sbsbsr_test.fileids
After that we execute the main command for decoding or testing in Linux terminal
perl scripts_pl/decode/slave.pl
Executing this command will decode the corresponding speech of input audio with the help of
our trained acoustic model. We can find the result of testing in “result” folder under project
folder. For our fifty sentences trained model the result is below
Figure 2.2.5.1: Testing with Pocketsphinx
2.2.5.2 Testing with Sphinx4
Sphinx4 is an adjustable, modifiable recognizer written in Java. We use this sphinx4 java library
to test our trained model in windows 7 operating system. We need two softwares to test our
model with sphinx4 decoder. They are sphinx4 and eclipse IDE. After installing the eclipse IDE
in windows we have to download sphinx4 from CMU sphinx site. After downloading sphinx4
we extract the zip file in any place in windows. Then we have to create a new java project in
eclipse and make a java file with the help of demo application from CMU sphinx. After that we
need to add the files shown in below from our previous project in eclipse project [11]:
“sbsbsr.cd_cont_100” folder from sbsbsr/model_parmeters/
*.dic
*.lm.DMP
*.filler

32
We create a cofig.xml file in eclipse project to tell the configuration to recognizer and say where
the required model files are placed. We can create this cofig.xml with the help of config file in
sphinx4 demo application. We need to add four java jar files js.jar, jsapi.jar, sphinx4.jar, tags.jar
from sphinx4/lib directory to our project. Now our java project is ready to build and run. After
building our project we run the project and can test with live voice input from microphone. For
our ten sentences trained model result is
Figure 2.2.5.2: Testing with Sphinx4
We can build various applications with the help of sphinx4 by using java language. We build
some application using sphinx4 that will be discussed later.
Chapter 3
TESTING &
PERFORMANCE
EVALUATION

33
3.1 Testing and Performance Evaluation
We tried to test our model in various environments such as open room, closed room,
university lab room, common room etc. We have completed our testing using audio inputs of six
test speaker .For the live testing we are using microphone in different environments. [9]We are
completed our test using two different kinds of decoder those are:
1. Pocket Sphinx
2. Sphinx4
3.2 Test Results with Pocket Sphinx:
Experiment No Details Results

34
Experiment 01 Using Trained Data Set
Number of Speaker: 5
Male: 4
Female: 1
Total Words: 1025
Correct: 975
Errors: 98
Total Percent correct = 95.12%
Error = 9.56%
Accuracy = 90.44%
Male: 3
Female: 0
Total Words: 615
Correct: 591
Errors: 39
Error = 6.34%
Accuracy = 93.66%
Male: 3
Female: 0
Total Words: 615
Correct: 601
Errors: 22
Error = 3.58%
Accuracy = 96.42%
Male: 2
Female: 3
Total Words: 1025
Correct: 938
Errors: 184
Error = 17.95%
Accuracy = 82.05%
Male: 0
Female: 3
Total Words: 615
Correct: 545
Errors: 125
Error = 20.33%
Accuracy = 79.67%
Table 3.2.1 Experimental details with Results for Pocket Sphinx

35
Chart 3.2.2 Experiment Results with Pocket Sphinx
Average Accuracy = 88.44%

36
3.3 Test Results with Sphinx4:
3.3.1 Input Type: Microphone
Experiment 01 User Type: Trained
Environment: Closed Room
Speaker Type: Male
Input Device: Microphone
Number of Words: 156
Correct Words:154
Errors: 2
Percent of Correct: 98%
Errors: 2%
Accuracy: 98%
Experiment 02 User Type: Untrained
Environment: Lab Room
Speaker Type: Male
Correct Words: 120
Errors:10
Errors: 10%
Accuracy: 90%
Environment: University Campus
Speaker Type: Male
Correct Words: 122
Errors:18
Errors: 18%
Accuracy: 82%
Environment: University Campus
Speaker Type: Female
Correct Words: 95
Errors:13
Errors: 13%
Accuracy: 87%
Environment: University floor
Speaker Type: Female
Correct Words: 115
Errors:11
Errors: 11%
Accuracy: 89%
Environment: Closed Room
Speaker Type: Male
Correct Words: 127
Errors:1
Errors: 1%
Accuracy: 99%
Table 3.3.1.1 Experimental Details with Results for Sphinx 4 Live

37
Chart 3.3.1.2 Experiment Results with Sphinx-4 Live Input

38
3.3.2 Input Type: Audio
Male: 3
Female:0
Total Words: 210
Correct: 194
Errors: 16
Error = 15.71%
Accuracy = 85.71%
Male: 2
Female:1
Total Words: 210
Correct: 188
Errors: 22
Error = 22%
Accuracy = 77.14%
Male: 2
Female:1
Total Words: 210
Correct: 182
Errors: 28
Error = 28%
Accuracy = 72.38%
Male: 2
Female:1
Total Words: 210
Correct: 194
Errors: 16
Error = 16%
Accuracy = 84.44%
Male: 1
Female:2
Total Words: 210
Correct: 184
Errors: 26
Error = 26%
Accuracy = 74.76%
Table 3.3.2.1 Experimental Details with Results for Sphinx 4 Audio

39
Chart 3.3.2.2 Experiment Results with Sphinx-4 Audio Input

40
Chapter 4
APPLICATION &
DEVELOPING
 Reveiw of Developed Application

41
4.1 Review of Some Developed Recognition Application
We developed four applications.They are dictation applications, phonetic translator,
training file creator and desktop command type application.
4.1.1 Dictation Application
We write this application with the help sphinx4 demo application. The main objective of this
application is to recognize sentences. Actually this is our main objective in this project.
4.1.2 Phonetic Translator
For training the acoustic model we need a file called "*.dic". In this file all training words and
their pronunciation are placed. We have made these pronunciations several times when training
acoustic model experimentally. As time goes on we think about an automatic pronunciation or
phonetic translation maker and this software is the implementation of that thinking. First we
made a database where all phonemes and their corresponding letters are stored. We took help
from IPA chart and various thesis papers [1] [9] [10] to make this database. However, various
sources define phonemes in different ways. Even all letters’ phoneme is not defined. That’s why
we personally define some phonemes for some consonant and conjunct letters. After making this
database we made phonetic translator to give phonetic translation of Bengali words with the help
of our created database.
Fig 4.1.2: Dictionary files with phonetic translation.
4.1.3 Training File Creator
For training the acoustic model we also need fileids and transcript file. These two files contain
information about training audio file paths and their corresponding sentences. Before creating

42
this program we have to make these two files manually. But after creating our software we can
make these two big files automatically within moments. For example, if we have 8000
Fig 4.1.3.1: Fileids files with phonetic translation.
audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and
their’s corresponding sentences in transcript file. But now we can create these two files
automatically if we provide root folder name of audio file and sentence corpus file to this
software as input.
Fig 4.1.3.2: Transcription File.
4.1.4 Command Application
We make a simple voice command application. By using Bengali word as voice command this
application do some common task such as opening my computer, left click, right click etc.

43
Chapter 5
LIMITATION &
FUTURE WORK
 LINITATION
 FUTUR WORK

44
Limitation
In our project we have some limitation in some specific tasks. The system of our Project
has been built on small data for time consistency. We have selected a domain about University
admission information for new comer students with 185 sentences. But it was difficult to collect
lots of audio from 16 speakers with a short span of time. As a result, we have selected 50
sentences from 185 sentences for training. But more speakers are needed for getting more
accurate results. For creating dictionary file we have also faced some problems.Because our
Bengali phoneme list is not declared accurately and we don’t know exact number of phonemes in
our Bengali language as different researchers said about different number of phonemes. The
performance of system depends on speaker pronunciation, environment and microphone. It
recognizes the sentences accurately when speaker speak the sentences loudly and clearly and
sometimes it cannot recognize the sentences accurately because of slowly speaking and
pronunciation problem. We created a program for automatically generated pronunciation of a
Bengali word. But this software is not working properly because of encoding problem. As
accurate phoneme is a prerequisite for good pronunciation, that’s why if we have accurate
number of phonemes then we can hope a good output from this software.
Future Works
We have done implementation of Bengali Speech Recognition for small data size. In
future we will increase our data size for creating a complete model and we have a plan to
increase its capability to recognize speech more accurately and enhance its vocabulary. We also
have developed software for making dictionary file, fileids file and transcription file. We want to
make a user friendly stand-alone GUI application for writing Bengali language. We also have an
intention to develop a complete desktop command type application for Bengali. Still training and
creating a model depend on developer that’s why we have a plan to make an automatic trainer
that can be used by a normal user. By using this automatic trainer, user will be able to train any
sentence with the corresponding audio. We also want to integrate this system to various
document type applications for writing Bengali sentence by just uttering the sentence. We want
to make voice respond type application. It will work like a user asking the software to give
answer of his question and software will give the predefined answer of this question. For making
a good recognition application we need a lot of audio. So we want to develop a website for
collecting audio from people. From this website we will be able to collect audio and using this
audio we will enrich our recognition application.

45
Chapter 6
C ONCLUSION &
REFERENCES
 CONCLUSION
 REFERENCES

46
Conclusion
Speech is the primary and the most convenient means of communication between people.
Lot of research in the field of ASR is being carried out for English, Hindi, Urdu, Arabic,
Japanese languages and so on. But in our mother tongue Bengali is still in beginner level in this
field. So we tried to learn about this field and to develop some tools to recognize Bengali
language. We tried to discuss about our objectives, various tools we used and process of speech
recognition through this whole report. But our developed tools are in preliminary level. For
making good and complete recognition application, lots of improvement required such as we
need a big training database, lots of speakers, audio with low noise etc. Still no speech
recognizer is 100% accurate. But if we can improve the requirements of a good recognizer and
can train our system more accurately then the result of the system will be enough to achieve our
goal.

47
References
[1]. M.A.Anusuya & S.K.Katti, “Speech Recognition by Machine: A Review” (IJCSIS)
International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009.
[2]. Morched Derbali, MU Tasem Jarrah, Mohd Taib Wahid “A Review of Speech Recognition
with Sphinx Engine in Language Detection” Journal of Theoretical and Applied Information
Technology, Vol. 40 No.2, 2005 - 2012.
[3]. L. Rabiner & B. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993.
[4]. Daniel Jurafsky and James H.Martin, “An Introduction to Natural Language Processing,
Computational Linguistics and Speech Recognition”, Prentice Hall, 2000.
[5]. L.R. Rabiner and R.W. Schafer, “Digital Processing of Speech Signal”, Prentice Hall, 1978.
[6]. A K M Mahmudul Hoque, “Bengali Segmented Automatic Speech Recognition”,
BRACU, 2006.
[7]. Abul Hasanat Md. Rezaul Karim, Md. Shahidur Rahman and Md. Zafar Iqbal, “Recognition
of Spoken Letters in Bangla”, banglacomputing.net, 2002.
[8]. Md. AbulHasnat, Jabir Mowla, Mumit Khan, “Isolated and Continuous Bangla Speech
Recognition: Implementation, Performance and application perspective”, BRACU, 2007.
[9]. Shammur Absar Chowdhury, “Implementation of Speech Recognition System for Bangla”,
BRACU, August 2010.
[10]. Qqbal S/O Shahzad, “Speech Recognition System”, Iqra University, March 2009.
[11]. Tran Viet Khai,”Sphinx4 Adaptation to Vietnamese Language, Vietnamese Automatic
Digit Recognition”, Bo Xuan Tu, Hochiminh city,Vietnam, 2008.
[12]. Sadaoki Furui, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous
Speech”, IEEE, 2004.
[13]. M. S. Islam, “Research on Bangla Language Processing in Bangladesh: Progress and
Challenges”, BUET, 2009.
[14]. P. Foster, T. Schalk, “Speech Recognition: The Complete Practical Reference Guide”,
1993. ISBN: 0936648392.
[15]. H. Satori, M. Harti and N. Chenfour, “Introduction to Arabic Speech Recognition Using
CMUS Sphinx System”, 2007.

48
Fig 7.1: Speaker Profiles
Speaker
ID
Name Age Gender District Environment Institution/Other
02 Bappy 23 Male Sylhet Closed Room Leading University
03 Bijoy 23 Male Moulovibazar Closed Room M.C. College
04 Dola 24 Female Chittagong Department Leading University
05 Falguny 24 Female Sylhet Open Space Leading University
06 Lovely 20 Female Sylhet Class Room Lotifa Shofi
Chowdhury Mohila
College
07 Mazed 23 Male Moulovibazar Closed Room Leading University
08 Moni 23 Female Sylhet Lab Leading University
09 Pinku 23 Male Sylhet Closed Room Sylhet Govt.
College
10 Polash 24 Male Feni Cafeteria Leading University
11 Pritom 20 Male Sylhet Closed Room Leading University
12 Razib 23 Male Sylhet Cafeteria Leading University
13 Rumi 22 Female Sylhet Lab Leading University
14 Sanjoy 23 Male Sylhet Closed Room Leading University
15 Shimul 23 Male Sylhet Closed Room Leading Univerity
16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon
College
APENDICES
Speaker Profile

49
Unicode to IPA Chart
Bangla Pnoneme (ব্যজ্ঞনবর্ণ) IPA
ক K
খ KH
গ G
ঘ GH
ঙ NG
চ C
ছ CH
জ J
ঝ JH
ঞ NIO
ট T
ঠ TH
ড D
ঢ DH
ণ N
ত TA
থ TO
দ DA
ধ DO
ন N
প P
ফ PH

50
ব B
ভ BH
ম M
য Z
র R
ল L
শ SH
ষ SH
স S
হ H
ড় RA
ঢ় RH
য় Y
ৎ T``
NG
:
^
Bangla Pnoneme IPA
AA
I
II
U
UU
RI

51
E
OI
O
OU
Bangla Pnoneme ( ) IPA
ব- W
য- Y
র- R
ম- M
RR
Bangla Pnoneme (নাম্বার) IPA
শুন্য 0
এক 1
দুই 2
তিন 3
চার 4
পাঁচ 5
ছয় 6
সাত 7
আট 8
নয় 9

52
Bangla Pnoneme (যুক্তবর্ণ) IPA
KK
KT
KT
KTR
KW
KM
KY
KR
KL
KKH
KKHW
KKHN
KKHM
KKHY
KS
KHY
KHR
GN
GDH
NGM
CC
CCH
CCHW
CCHR
CNG

53
CY
GN
GNY
GW
GM
GY
GR
GL
KR
KL
KKH
KKHW
KKHN
KKHM
KKHY
KS
KHY
KHR
GN
GDH
NGM
CC
CCH
CCHW
CCHR

54
CNG
CY
JJ
JJW
JJH
GG
JW
JY
JR
NC
NCH
NJ
NJH
TT
TT
TTW
TTH
TN
TW
TM
TMY
TY
TR
THW
THY
THR

55
DG
DGH
DD
DDW
DDH
DW
DV
DM
DY
DR
NM
NY
NS
PT
PT
PN
PP
PY
PR
PL
PS
FR
FL
BJ
BD
BDH

56
BB
BY
BR
BL
LT
LD
LDH
LP
LB
LV
LM
LY
LL
SHC
SHCH
SHT
SHN
SHW
SHM
SHY
SHR
SHL
SHK
SHKR
SHT
SF

57
SW
SM
SY
SR
SL
SKL
HN
HN
HW
HM
HY
HR
HL
HRRI
GHN
GHY
GHR
NK
NKY
NGKKH
NGKH
NGG
NGGY
NGGH
NGGHY
NGGHR

58
TW
TM
TY
TR
DD
DY
DR
DHY
DHR
NT
NTH
ND
NDY
NDR
NDH
NN
NW
NM
NY
DHN
DHW
DHM
DHY
DHR
NT
NTH

59
ND
NT
NTW
NTY
NTR
NTH
ND
NDY
NDW
NDR
NDH
NDHY
NDHR
NN
NW
VY
VR
VL
MTH
MN
MP
MPR
MF
MB
MV
MVR

60
MM
MY
MR
ML
ZY
, , RRK
, RRKY
LK
LG
SHTY
SHTR
SHTH
SHTHY
SHN
SHP
SHPR
SPHY
SHW
SHM
SK
SKR
ST
STR
SKH
ST
STW

61
STY
STH
STHY
SN
SP
Corpus
Our total Sentences is 185 but we have recognized 50 sentences for short time duration.
No Sentence
Fig 7.2: Unicode to IPA Chart

62
1 শুভ সকাল
2 ধন্যবাদ
3 আমি আপনাকে কি সাহায্য করতে পারি
4 আমি কিছু তথ্য জানতে এসেছি
5 কি বলুন
6 ভর্তি বিষয়ে
7 এএএএ কোন বিভাগে ভর্তি হতে ইচ্ছুক
8 কম্পিউটার বিজ্ঞান এ এএএএএএএএ বিভাগে
9 এ বিভাগে ভর্তি চলছে
10 এ বিভাগে কি কি সুবিধা আছে
11 বিভিন্ন ধরনের ল্যাব সুবিধা আছে
12 যেমন
13 দুএএ কম্পিউটার ল্যাব আছে
14 ও আচ্ছা
15 একটি শুধু বিজ্ঞান বিভাগের জন্য
16 আর আরেকটি
17 সব এএএএএএএ জন্য
18 প্রতি এএএএএএ কতগুলো কম্পিউটার আছে
19 বত্রিশটি করে
20 আর কিছু
21 কতজন শিক্ষক আছেন এ বিভাগে
22 প্রায় এএএএ জন
23 মোট কতজন ছাত্রছাত্রী এ এএএএএএ
24 প্রায় এএএএএ জন
25 এ বিভাগেএ এএএএএএএএ এএএ এএ
26 রুমেল এম এস রাহমান পীর
27 বিশ্ব বিদ্যালয়ের প্রতিষ্ঠাতা কে জানতে পারি
28 অবশ্যই
29 দানবীর মিস্টার রাগিব আলী
30 আপনাদের কি আর কোন শাখা আছে
31 না সিলেট এই একমাত্র ক্যাম্পাস
32 এএএএএএএএএএএএএ কবে স্থাপিত হয়েছে
33 এএএ এএএএএ এএ সালে
34 আর কি কি সুবিধা আছে
35 হার্ডওয়্যার সার্কিট ও রসায়নের ল্যাব আছে
36 লাইব্রেরী কি আছে
37 অবশ্যই একটা বড় লাইব্রেরী আছে
38 ক্যান্টিন আছে
39 খুবই উন্নতমানের একটি ক্যান্টিনও আছে

63
40 ভর্তির শেষ তারিখ কবে
41 এ মাসের পাঁচ তারিখ
42 ক্লাস শুরু হবে এ মাসের দশ তারিখ হতে
43 কোন কোন তলা নিয়ে বিশ্ব বিদ্যালয় ক্যাম্পাস
44 তিন চার এএএ পাঁচ এএএ নিয়ে
45 আপনাদের বএরে কত সেমিস্টার
46 তিন সেমিস্টার
47 তাহলে তো মোট এএএ সেমিস্টার
48 জি হ্যা
49 এ বিভাগে মোট কত ক্রেডিট পড়ানো হয়
50 এএএএ এএএএএএএ ক্রেডিট
51 ইউ
52
53 কত
54
55 কত
56 উপর
57
58 এক
59
60 আর
61 কত
62 এক
63
64
65 আর
67 ও
68
69 কত
70 এক
71 কত
72
73 রকম
74
75 আর
76 ও

64
77 সব একশত
78
79 উপর
80
81 আর কত
82 এ আর এ
83 এ
84
85 এক
86
87 আর সবসময়
88
89 ও এ
90
91 এ আর এ
92 এ আর এ
93
94 আরকম
95 আর এ
96
97 আর
98
99
100
101 আর
102
103
104
105
106
107 আরও
108
109 সব

65
110
111
112
113 ও সব
114
115 হয়
116
117
118 আর
119 -ই
হয়
120
121 হয়
122
123 ও হয়
124
125 -
126 হয়
127
128
129 এ
হয়
130 সব
131
132
133
134
135
136 ওহ
137
হয়
138 হয়
139 বছর হয়
140
141
142
143
144 হয়

66
145 এখনও
146
147
148
149 ,
150
151
152
153 একর
154
155
156
157
158
159 ভবন
160
161
162
163
164
165 একবৎসর
166
167
168
169 সময়
170
171 এখন
172 পর
173
174 পর
175
176 পর
177 এক পর

67
178
179 ও
180
181
182
183
184
185
Fig 7.3: Corpus about University Admission Information.

68
CODE OF OUR PROJECT:
package sbs.BSR.training.files.creator;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
public class FileidsCreator {
static List<String> dirTreeLevel1= new ArrayList<String>();
static List<String> tempList = new ArrayList<String>();
public static void main(String[] args) {
SortArrayList sortObj = new SortArrayList();
int lineCounts = 0;
String dirTreeRootName = "C:/trainnign_file/sbs_asr_train/";
String root = getRootFromPath(dirTreeRootName);
listdir(dirTreeRootName,1);
Collections.sort(dirTreeLevel1);
int sizeOfdirTreeLevel1 = dirTreeLevel1.size();
int i = 0,j=0,k=0;
while(sizeOfdirTreeLevel1>i)
{
String path2 = dirTreeRootName+dirTreeLevel1.get(i);
dirTreeLevel2.clear();
listdir(path2,2);
dirTreeLevel2 = sortObj.sortList(dirTreeLevel2);
int sizeOfdirTreeLevel2 = dirTreeLevel2.size();
String path3="";
while(sizeOfdirTreeLevel2>j)
{
path3 = path2+"/"+dirTreeLevel2.get(j)+"/";
listdir(path3,3);
while(dirTreeLevel3.size()>k)
{
String targetPath =
root+"/"+dirTreeLevel1.get(i)+"/"+dirTreeLevel2.get(j)+"/"+dirTreeLevel3.get(k);

69
targetPath = targetPath.replaceAll(".wav", "");
writIntoFile(targetPath,"C:/trainnign_file/sbs_asr_train/"+"sbs_asr_train.fileids");
lineCounts++;
k++;
}
j++;
}
//file listing end
j=0;
i++;
}
transcriptCreator tcObj = new transcriptCreator();
try {
tcObj.readCorpus("C:/trainnign_file/corpus.txt");
tcObj.CreateTransCriptFile(dirTreeLevel3,
"C:/trainnign_file/sbs_asr_train/sbs_asr_train.transcript");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int si=0;
}
public static void listdir(String path,int Level)
{
File folder = new File(path);
File[] listOfFiles = folder.listFiles();
int numofL_I = 0;
int numOfL = listOfFiles.length;
while(numOfL>numofL_I)
{
if (listOfFiles[numofL_I].isDirectory())
{
if(Level == 1)
{
dirTreeLevel1.add(listOfFiles[numofL_I].getName());
}
else if(Level == 2)
{
}
}
else
{

70
if (listOfFiles[numofL_I].isFile())
{
}
}
numofL_I++;
}
}
public static String getRootFromPath(String UserDir)
{
String root = null;
int count = 0;
int[] indexes = new int[2];
int i = 0;
i = indexes[1] = UserDir.lastIndexOf('/');
i=i-1;
while(i>0)
{
if(UserDir.charAt(i)=='/')
{
indexes[0] = i;
break;
}
i--;
}
root = UserDir.substring(indexes[0]+1, indexes[1]);
return root;
}
public static void writIntoFile(String data,String path)
{
try{
FileWriter fstream = new FileWriter(path,true);
BufferedWriter out = new BufferedWriter(fstream);
out.write(data+'n');
out.close();
}catch(IOException e){
}
}
}
……………………………………………………………………………………………

71
ARRAY:
……………………………………………………………………………………………
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
public class SortArrayList{
public List<String> sortList(List<String> unsortList){
List<String> mysortList = new ArrayList<String>();
int i = 0;
while(unsortList.size()>i)
{
String str = unsortList.get(i);
str = str.replaceAll("[^d.]", "");
mysortList.add(str);
i++;
}
int[] sortint = new int[mysortList.size()];
i = 0;
{
sortint[i] = Integer.valueOf(mysortList.get(i));
i++;
}
Arrays.sort(sortint);
String folNameWONum = unsortList.get(0).replaceAll("[^a-z ^A-Z]","");
mysortList.clear();
i = 0;
{
String requiredString =
folNameWONum+String.valueOf(sortint[i]);
mysortList.add(requiredString);
i++;
}
return mysortList;
}

72
}
………………………………………………………………………………………………
TRANSCRIPT CREATOR
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
public class transcriptCreator {
static ArrayList<Object> inputLines=new ArrayList<Object>();
public void readCorpus(String path) throws FileNotFoundException {
FileInputStream fstream = new FileInputStream(path);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
try{
while ((strLine = br.readLine()) != null) {
inputLines.add(strLine);
}
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
int inputLineSize = inputLines.size();
int i = 0;
while(inputLineSize>i)
{
i++;
}
}
public static void CreateTransCriptFile(List<String> data,String path)
{
try{

73
FileWriter fstream = new FileWriter(path,true);
int numOfwriting = data.size();
int i = 0;
int lineStart = 0;
int lineEnd = inputLines.size();
BufferedWriter out = new BufferedWriter(fstream);
while(numOfwriting>i)
{
if(lineStart == lineEnd) lineStart = 0;
String wavRemove = data.get(i).toString();
wavRemove = wavRemove.replaceAll(".wav", "");
String leadTrailspaceRemoved =
inputLines.get(lineStart).toString();
leadTrailspaceRemoved = leadTrailspaceRemoved.trim();
String pattern = "<S> "+leadTrailspaceRemoved+" </S>
("+wavRemove+")";
out.write(pattern+'n');
lineStart++;
i++;
}
//Close the output stream
out.close();
}catch(IOException e){
}
}
}
…………………………………………………………………………………………….
FILE OPERATOR
package ptpack;
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.InputStreamReader;
public class FileOperator {
@SuppressWarnings("null")

74
public ArrayList<Object> getStrings()
{
ArrayList<Object> allInputStrings=new ArrayList<Object>();
int aisI = 0;
try {
FileInputStream fstream = new
FileInputStream("C:/trainnign_file/testInput.dic");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String str;
while ((str = br.readLine()) != null) {
str = str.trim();
allInputStrings.add(str);
System.out.println(str.trim()+" "+str.length());
}
in.close();
} catch (Exception e) {
System.err.println(e);
}
return allInputStrings;
}
public void createFile(String finalData)
{
try {
BufferedWriter out = new BufferedWriter(new
FileWriter("C:/trainnign_file/sbs_asr_train4.dic"));
out.write(finalData);
out.close();
System.err.println(e);
}
}
}
…………………………………………………………………………………………..
PHONETIC TRANSLATION
package ptpack;
public class PhoneticTranslation {

75
PronounciationGenarator pgObj = new PronounciationGenarator();
FileOperator foObj = new FileOperator();
ArrayList<Object> inputStrings = new ArrayList<Object>();
inputStrings = foObj.getStrings();
String pro = "";
System.out.println("in phonetic translation");
int i = 0;
String is = "";
String fileImage = "";
while(inputStrings.size()>i)
{
is = inputStrings.get(i).toString().trim();
pro = pgObj.getPronouciation(is);
pro = pro.trim();
fileImage = fileImage+is+" "+pro+"n";
i++;
}
//System.out.println(fileImage);
foObj.createFile(fileImage);
}//main end
}
Prodb
package ptpack;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class Prodb {
private static final String DBURL =
"jdbc:mysql://localhost:3306/bsr?user=root&password=" +
"&useUnicode=true&characterEncoding=UTF-8";
private static final String DBDRIVER = "com.mysql.jdbc.Driver";
static {
try {
Class.forName(DBDRIVER).newInstance();
} catch (Exception e){

76
}
}
private static Connection getConnection()
{
Connection connection = null;
try {
connection = DriverManager.getConnection(DBURL);
}
catch (Exception e) {
}
return connection;
}
public static void showEmployee() {
Connection con = getConnection();
Statement stmt =null;
try {
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("Select * from employees "
+ "where EmployeeID=1001");
if (rs.next()) {
System.out.println("EmployeeID : " +
rs.getInt("EmployeeID"));
System.out.println("Name : " + rs.getString("Name"));
System.out.println("Office : " + rs.getString("Office"));
}
else {
System.out.println("No Specified Record.");
}
rs.close();
} catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
}
finally {
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
System.err.println("SQLException: " + e.getMessage());
}
}
if (con != null) {
try {

77
con.close();
}
}
}
}
public static boolean isBanjonBorno(char ch) {
Connection con = getConnection();
Statement stmt =null;
int id=0;
boolean isBBorno = false;
try {
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("Select id from banglatab "
+ "where letter= '"+ch+"'");
if (rs.next()) {
id = rs.getInt("id");
if(id>=13 && id<=48)
{
isBBorno = true;
}
}
else {
System.out.println("No Specified Record.");
}
rs.close();
} catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
}
finally {
if (stmt != null) {
try {
stmt.close();
}
}
if (con != null) {
try {
con.close();
}

78
}
}
return isBBorno;
}
}//class end
……………………………………………………………………………………………
PRONOUNCIATION GENARATOR
package ptpack;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class PronounciationGenarator {
Prodb pdobj = new Prodb();
String BanglaWord = "";
int BanglaWordLength = 0;
int ConjunctsPosition[] = new int[20];
int NoConjunctsPosition[] = new int[20];
int cpi=0,ncpi=0,k=0;
char ConjuctsIdentifyCharacter = ' ';
int calltimes = 1;
public String getPronouciation(String bw)
{
BanglaWord = bw;
BanglaWordLength = BanglaWord.length();
while(k<BanglaWordLength)
{
k++;
}
k=0;
while(BanglaWordLength>k)
{
if(BanglaWord.charAt(k)==ConjuctsIdentifyCharacter)
{
ConjunctsPosition[cpi] = k-1;
cpi++;

79
ConjunctsPosition[cpi] = k;
cpi++;
ConjunctsPosition[cpi] = k+1;
cpi++;
}
k++;
}
int NoOfConjuctsPosition = cpi;
k = 0;
//System.out.println("nConjunctsPosition");
while(cpi>k)
{
//System.out.print(ConjunctsPosition[k]+" ");
k++;
}
//int wc[] = {0,1,2,4,5,6};
int i=0,trace=0;
cpi = 0;
ncpi = 0;
while(BanglaWordLength>i)
{
while(NoOfConjuctsPosition>cpi)
{
if(ConjunctsPosition[cpi]==i)
{
trace = 1;
break;
}
cpi++;
}
if (trace==0)
{
NoConjunctsPosition[ncpi]=i;
ncpi++;
}
cpi=0;
i++;
trace = 0;
}
i=0;
while(ncpi>i)
{
i++;
}
// matching serially and making conjuct

80
i = 0;
cpi = 0;
ncpi = 0;
int ai = 0;
String SearchStrings[] = new String[100];
int ssi = 0;
int serialityTrace =0;
String tempString = "";
boolean tempc,tempc2;
while(BanglaWordLength>i)
{
while(NoOfConjuctsPosition>cpi)
{
if(ConjunctsPosition[cpi]==i)
{
trace = 1;
break;
}
cpi++;
}
if (trace==0)// if nonconjuct
{
SearchStrings[ssi] = Character.toString(BanglaWord.charAt(i));
ssi++;
if(BanglaWordLength!=(i+1))
{
calltimes++;
tempc = pdobj.isBanjonBorno(BanglaWord.charAt(i));
tempc2 = pdobj.isBanjonBorno(BanglaWord.charAt(i+1));
if(tempc==true && tempc2==true)
{
SearchStrings[ssi] = Character.toString('অ');
ssi++;
}
}
}
else // if conjuct
{
tempString = Character.toString(BanglaWord.charAt(i));
if(BanglaWord.charAt(i)=='র')
{
SearchStrings[ssi] =
Character.toString(BanglaWord.charAt(i));
ssi++;
i+=2;

81
SearchStrings[ssi] =
Character.toString(BanglaWord.charAt(i));
ssi++;
}
else
{
while(NoOfConjuctsPosition>serialityTrace)
{
if(ConjunctsPosition[serialityTrace]==i)
{
break;
}
serialityTrace++;
}
int diffbi = Math.abs(ConjunctsPosition[serialityTrace]-
ConjunctsPosition[serialityTrace+1]);
System.out.println("BanglaWord = "+BanglaWord+"
"+BanglaWord.length());
while(diffbi<=1)
{
System.out.println("diffbi = "+diffbi+" "+" serialityTrace
"+serialityTrace);
i++;
System.out.println("i = "+i+" BanglaWord.charAt(i)
"+BanglaWord.charAt(i));
tempString += Character.toString(BanglaWord.charAt(i));
serialityTrace++;
diffbi = Math.abs(ConjunctsPosition[serialityTrace]-
ConjunctsPosition[serialityTrace+1]);
}
SearchStrings[ssi] = tempString;
ssi++;
}
}//conjuct adding end
cpi=0;
i++;
trace = 0;
}
i=0;
while(ssi>i)
{
i++;
}
String phoneticTrans = "";
Connection conn = null;

82
Statement stmt = null;
ResultSet rs = null;
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
String connectionUrl =
"jdbc:mysql://localhost:3306/bsr?useUnicode=yes&characterEncoding=UTF-8";
String connectionUser = "root";
String connectionPassword = "";
conn = DriverManager.getConnection(connectionUrl, connectionUser,
connectionPassword);
stmt = conn.createStatement();
i=0;
while (ssi>i) {
rs = stmt.executeQuery("SELECT pro FROM banglatab where
letter = '"+SearchStrings[i]+"'");
rs.next();
String pro = rs.getString("pro");
phoneticTrans = phoneticTrans+pro+" ";
i++;
}
rs.close();
} finally {
try { if (rs != null) rs.close(); } catch (SQLException e) {
e.printStackTrace(); }
try { if (stmt != null) stmt.close(); } catch (SQLException e) {
try { if (conn != null) conn.close(); } catch (SQLException e) {
}
return phoneticTrans;
}
}
……………………………………………………………………………………………
SBSASR_MAIN
package SBS.BSR.S50;
import org.omg.CORBA.portable.InputStream;
import org.omg.CORBA.portable.OutputStream;

83
import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
public class SBSBSR {
ConfigurationManager cm;
if (args.length > 0) {
cm = new ConfigurationManager(args[0]);
} else {
cm = new
ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml"));
}
// allocate the recognizer
System.out.println("Loading...");
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
recognizer.allocate();
// start the microphone or exit the program if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
System.exit(1);
}
printInstructions();
//giveCommand();
// loop the recognition until the programm exits.
String comString = " ";
System.out.println("comString: " + comString + " length :
"+comString.length()+'n');
while (true) {
System.out.println("Start speaking. Press Ctrl-C to quit.n");
Result result = recognizer.recognize();
if (result != null) {
String resultText = result.getBestResultNoFiller();
System.out.println("You said: " + resultText +'n');
} else {

84
System.out.println("I can't hear what you said.n");
}
}
}
/** Prints out what to say for this demo. */
private static void printInstructions() {
System.out.println("Sample sentences:n" +
" n" +
" n" +
" n" +
" nn");
}
}
Sbsbsr Transcriber
package SBS.BSR.S50;
import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
import javax.sound.sampled.UnsupportedAudioFileException;
import java.awt.AWTException;
import java.io.File;
import java.net.URL;
public class Transcriber {
public static void main(String[] args) throws IOException,
UnsupportedAudioFileException, AWTException {
URL audioURL;
audioURL = new File(args[0]).toURI().toURL();
} else {
audioURL = Transcriber.class.getResource("sanjoy_falguni_dola_s20.wav");

85
}
URL configURL = Transcriber.class.getResource("sbsbsr_transcriber_config.xml");
ConfigurationManager cm = new ConfigurationManager(configURL);
/* allocate the resource necessary for the recognizer */
// configure the audio input for the recognizer
AudioFileDataSource dataSource = (AudioFileDataSource)
cm.lookup("audioFileDataSource");
dataSource.setAudioFile(audioURL, null);
// Loop until last utterance in the audio file has been decoded, in which case the
recognizer will return null.
Result result;
while ((result = recognizer.recognize())!= null) {
System.out.println(resultText);
}
}
}
Desktop Command Application
package SBS.BSR.CMD.APP.S12;
import java.awt.Robot;
import java.awt.event.InputEvent;
import java.awt.event.KeyEvent;
public class CommandActivator {
public void leftClick() throws AWTException{

86
Robot robot = new Robot();
robot.mousePress(InputEvent.BUTTON1_MASK);
robot.mouseRelease(InputEvent.BUTTON1_MASK);
}
/*
public void rightClick() throws AWTException{
}*/
public void doubleClick() throws AWTException{
}
public void copy() throws AWTException{
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_C);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_C);
}
public void paste() throws AWTException{
robot.keyPress(KeyEvent.VK_V);
robot.keyRelease(KeyEvent.VK_V);
}
public void delete() throws AWTException{
robot.keyPress(KeyEvent.VK_DELETE);
robot.keyRelease(KeyEvent.VK_DELETE);
}
public void selectAll() throws AWTException{
robot.keyPress(KeyEvent.VK_A);

87
robot.keyRelease(KeyEvent.VK_A);
}
public void up() throws AWTException{
robot.keyPress(KeyEvent.VK_PAGE_UP);
robot.keyRelease(KeyEvent.VK_PAGE_UP);
}
public void down() throws AWTException{
robot.keyPress(KeyEvent.VK_PAGE_DOWN);
robot.keyRelease(KeyEvent.VK_PAGE_DOWN);
}
public void previousPage() throws AWTException{
robot.keyPress(KeyEvent.VK_PAGE_UP);
robot.keyRelease(KeyEvent.VK_PAGE_UP);
}
public void nextPage() throws AWTException{
robot.keyPress(KeyEvent.VK_PAGE_DOWN);
robot.keyRelease(KeyEvent.VK_PAGE_DOWN);
}
public void openNewFile() throws AWTException{
robot.keyPress(KeyEvent.VK_N);
robot.keyRelease(KeyEvent.VK_N);
}
public void openHere() throws AWTException{
robot.keyPress(KeyEvent.VK_O);
robot.keyRelease(KeyEvent.VK_O);
}

88
public void close() throws AWTException{
robot.keyPress(KeyEvent.VK_ALT);
robot.keyPress(KeyEvent.VK_F4);
robot.keyRelease(KeyEvent.VK_ALT);
robot.keyRelease(KeyEvent.VK_F4);
}
public void startMenu() throws AWTException{
robot.keyPress(KeyEvent.VK_WINDOWS);
robot.keyRelease(KeyEvent.VK_WINDOWS);
}
public void refresh() throws AWTException{
}
public void help() throws AWTException{
}
public void showDesktop() throws AWTException{
robot.keyPress(KeyEvent.VK_D);
robot.keyRelease(KeyEvent.VK_D);
}
public void openMyComputer() throws AWTException{
robot.keyPress(KeyEvent.VK_E);
robot.keyRelease(KeyEvent.VK_E);
}
// sokrio
public void enter() throws AWTException{

89
robot.keyPress(KeyEvent.VK_ENTER);
robot.keyRelease(KeyEvent.VK_ENTER);
}
// porer window | ager window
public void altTab() throws AWTException{
robot.keyPress(KeyEvent.VK_ALT);
robot.keyPress(KeyEvent.VK_TAB);
robot.keyRelease(KeyEvent.VK_ALT);
robot.keyRelease(KeyEvent.VK_TAB);
}
// porer tab | ager tab
public void ctlTab() throws AWTException{
robot.keyPress(KeyEvent.VK_TAB);
robot.keyRelease(KeyEvent.VK_TAB);
}
public void openNotepad() throws AWTException, IOException{
ProcessBuilder proc=new ProcessBuilder("notepad.exe");
Process p=proc.start();
}
public void openBrowser() throws AWTException, IOException{
String theUrl = "http://www.google.com";
Runtime.getRuntime().exec
("rundll32 url.dll,FileProtocolHandler " + theUrl);
}
public void openFacebook() throws AWTException, IOException{
String theUrl = "http://www.facebook.com";
}
public void openYahoo() throws AWTException, IOException{
String theUrl = "http://www.yahoo.com";
}

90
public void openTechtunes() throws AWTException, IOException{
String theUrl = "http://www.techtunes.com.bd";
}
public void openProthomAlo() throws AWTException, IOException{
String theUrl = "http://www.prothom-alo.com";
}
}
SBSBSR.JAVA
package SBS.BSR.CMD.APP.S12;
import java.awt.Robot;
import org.omg.CORBA.portable.InputStream;
import org.omg.CORBA.portable.OutputStream;
import edu.cmu.sphinx.frontend.util.Microphone;
public class SBSBSR {
public static void main(String[] args) throws IOException, AWTException {
ConfigurationManager cm;
cm = new ConfigurationManager(args[0]);
} else {
cm = new
ConfigurationManager(SBSBSR.class.getResource("sbsbsr.config.xml"));

91
}
// allocate the recognizer
System.out.println("Loading...");
// start the microphone or exit the program if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
System.exit(1);
}
/*Robot robot = new Robot();
robot.delay(2000);
giveCommand("bangla command");
robot.delay(3000);*/
printInstructions();
//giveCommand();
// loop the recognition until the programm exits.
String comString = " ";
System.out.println("comString: " + comString + " length :
"+comString.length()+'n');
while (true) {
System.out.println("Start speaking. Press Ctrl-C to quit.n");
Result result = recognizer.recognize();
if (result != null) {
System.out.println("You said: " + resultText +'n');
giveCommand(resultText);
//CommandActivator obj = new CommandActivator();
//obj.openMyComputer();
} else {
System.out.println("I can't hear what you said.n");
}
}
}
/** Prints out what to say for this demo. */
private static void printInstructions() {
System.out.println("Sample sentences:n" +
" n" );

92
}
private static void giveCommand(String CompareText) throws AWTException,
IOException{
if(CompareText.equals(" ")){
CommandActivator obj = new CommandActivator();
obj.rightClick();
}
}
}
------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------

Thesis Paper of my Bachelor Degree

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Thesis Paper of my Bachelor Degree

Semelhante a Thesis Paper of my Bachelor Degree (20)

Thesis Paper of my Bachelor Degree