SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
                 Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.253-258
            Speech to Text Conversion using Android Platform
                        B. Raghavendhar Reddy#1, E. Mahender *2
                         #1
                              Department of Electronics Communication and Engineering
                                   Aurora’s Technological and Research Institute
                                      Parvathapur, Uppal, Hyderabad, India.
                                                  2Asst. Professor
                                   Aurora’s Technological and Research Institute
                                      Parvathapur, Uppal, Hyderabad, India.


Abstract
For the past several decades, designers have               remains speech. Market for smart mobile phones
processed speech for a wide variety of                     provides a number of applications with speech
applications       ranging       from       mobile         recognition implementation. Google's Voice Actions
communications to automatic reading machines.              and recently iphone's Siri are applications that
Speech recognition reduces the overhead caused             enable control of a mobile phone using voice, such
by alternate communication methods. Speech has             as calling businesses and contacts, sending texts and
not been used much in the field of electronics and         email, listening to music, browsing the web, and
computers due to the complexity and variety of             completing common tasks. Both Siri and Voice
speech signals and sounds. However, with                   Actions require an active connection to a network in
modern processes, algorithms, and methods we               order to process requests and most of Android
can process speech signals easily and recognize            phones can run on a 4G network which is faster than
the text. In this project, we are going to develop         the 3G network that the iPhone runs on. There is
an on-line speech-to-text engine. The system               also an issue of availability, Voice Actions are
acquires speech at run time through a                      available on all Android devices above Android 2.2,
microphone and processes the sampled speech to             but Siri is available only for owners of the iPhone
recognize the uttered text. The recognized text            4S. The Siri's advantage is that it can act on a wide
can be stored in a file. We are developing this on         variety of phrases and requests and can understand
android platform using eclipse workbench. Our              and learn from natural language, whereas Google's
speech-to-text system directly acquires and                Voice Actions can be operated only by using very
converts speech to text. It can supplement other           specific voice commands. In this work we have
larger systems, giving users a different choice for        developed an application for sending SMS messages
data entry. A speech-to-text system can also               which uses Google's speech recognition engine. The
improve system accessibility by providing data             main goal of application Voice SMS is to allow user
entry options for blind, deaf, or physically               to input spoken information and send voice message
handicapped users. Voice SMS is an application             as desired text message. The user is able to
developed in this work that allows a user to               manipulate text message fast and easy without using
record and convert spoken messages into SMS                keyboard, reducing spent time and effort. In this
text message. User can send messages to the                case speech recognition provides alternative to
entered phone number. Speech recognition is                standard use of key board for text input, creating
done via the Internet, connecting to Google's              another dimension in modern communications.
server. The application is adapted to input
messages in English. Speech recognition for                II. ANDROID
Voice uses a technique based on hidden Markov                       Android is a software environment for
models (HMM - Hidden Markov Model). It is                  mobile devices that includes an operating system,
currently the most successful and most flexible            middleware and key applications [1]. In 2005
approach to speech recognition.                            Google took over company Android Inc., and two
                                                           years later, in collaboration with the group the Open
Keywords: HMM, Android Os, DVM, Speech                     Handset Alliance, presented Android operating
Recognition, Intents                                       system (OS).
                                                           Main features of Android operating system are:
I. INTRODUCTION                                                  Enables free download of development
         Mobile phones have become an integral                        environment for application development.
part of our Everyday life, causing higher demands                Free use and adaptation of operating
for content that can be used on them. Smart phones                   system to manufacturers of mobile
offer customer enhanced methods to interact with                     devices.
their phones but the most natural way of interaction



                                                                                                 253 | P a g e
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
                 Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.253-258
         Equality of basic core applications and        construction of applications: activity, intent, service
          additional applications in access to           and the content provider. An activity is the main
          resources.                                     element of every application and simplified
         Optimized use of memory and automatic          description defines it as a window that users see on
         control of applications which are being         their mobile device. The application can have one or
         executed.                                       more activities. Main activity is the one that is used
         Quick and easy development of                  as startup. The transition between the activities is
         applications using development tools and        carried out in a way that launched activity calls a
         rich database of software libraries.            new activity. Each activity as a separate component
      High quality of audiovisual content, it is        is implemented with inheritance of Activity class.
         possible to use vector graphics, and most       During the execution of applications, activities are
         audio and video formats.                        added to the stack, currently running activity is on
      Ability to test applications on most              the top of the stack.
         computing platforms, including Windows,                   An intent is a message used to run the
         Linux…                                          activities, services, or recipient’s multicast. An
         The Android operating system (OS)               intent can contain the name of the components you
architecture is divided into 5 layers (fig. 1.). The     need to run, the action which is necessary to
application layer of Android OS is visible to end        execute, the address of stored data needed to run the
user, and consists of user applications. The             component, and component type. A service is a
application layer includes basic applications which      component that runs in the background to perform
come with the operating system and applications          long running operations or to perform work for
which user subsequently takes. All applications are      remote processes. One service can link multiple
written in the Java programming language.                applications and service is executed until a
Framework is extensible set of software components       connection with all applications is done. A content
used by all applications in the operating system. The    provider manages a shared set of application data.
next layer represents the libraries, written in the C    Data can be stored in the file system, a SQLite
and C + + programming languages, and OS accesses         database, on the web, or any other persistent storage
them via framework.                                      location which application can access [1].
Dalvik Virtual Machine (DVM), forms the main             Through the content provider, other applications can
part of the executive system environment. Virtual        query or even modify the data (if the content
machine is used to start the core libraries written in   provider allows it).
the Java
                                                         III. SPEECH RECOGNITION
                                                                  Speech recognition for application Voice
                                                         SMS is done on Google server, using the HMM
                                                         algorithm. HMM algorithm is briefly described in
                                                         this part. Process involves the conversion of
                                                         acoustic speech into a set of words and is performed
                                                         by software component. Accuracy of speech
                                                         recognition systems differ in vocabulary size and
                                                         confusability, speaker dependence vs. independence,
                                                         modality of speech (isolated, discontinuous, or
                                                         continuous speech, read or spontaneous speech),
                                                         task and language constraints [2].
                                                                  Speech recognition system can be divided
                                                         into several blocks: feature extraction, acoustic
                                                         models database which is built based on the training
                                                         data, dictionary, language model and the speech
                                                         recognition algorithm. Analog speech signal must
Fig1: Android Architecture                               first be sampled on time and amplitude axes, or
         programming language. Unlike Java’s             digitized. Samples of speech signal are analyzed in
virtual machine, which is based on the stack, DVM        even intervals. This period is usually 20 ms because
bases on registry structure and it is intended for       signal in this interval is considered stationary.
mobile devices. The last architecture layer of           Speech feature extraction involves the formation of
Android operating system is kernel based on Linux        equally spaced discrete vectors of speech
OS, which serves as a hardware abstraction layer.        characteristics. Feature vectors from training
The main reasons for its use are memory                  database are used to estimate the parameters of
management and processes, security model, network        acoustic models. Acoustic model describes
system and the constant development of systems.          properties of the basic elements that can be
There are four basic components used in                  recognized. The basic element can be a phoneme for



                                                                                                254 | P a g e
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
                 Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.253-258
continuous speech or word for isolated words            output sequence. It is defined by recursive relation.
recognition.                                            During the search, n-best word sequences are
          Dictionary is used to connect acoustic        generated using acoustic models and a language
models with vocabulary words. Language model            model.
reduces the number of acceptable word
combinations based on the rules of language and         IV. MAIN PARTS OF THE PROJECT
statistical information from different texts. Speech    A. Voice Recognition Activity class
recognition systems, based on hidden Markov                        Voice Recognition Activity is startup
models are today most widely applied in modern          activity defined as launcher in AndroidManifest.xml
technologies. They use the word or phoneme as a         file. REQUEST_CODE is static integer variable,
unit for modeling. The model output is hidden           declared on the beginning of activity and used to
probabilistic functions of state and can’t be           confirm response when engine for speech
deterministically specified. State sequence through     recognition is started. REQUEST_CODE has
model is not exactly known. Speech recognition          positive value. Results of recognition are saved in
systems generally assume that the speech signal is a    variable declared as ListView type.
realization of some message encoded as a sequence                  Method onCreate is called when activity is
of one or more symbols [3]. To effect the reverse       initiated.
operation of recognizing the underlying symbol                     This is where the most initialization goes:
sequence given a spoken utterance, the continuous       setContentView (R.layout.voice_recognition) is
speech waveform is first converted to a sequence of     used to inflate the user interface defined in res >
equally spaced discrete parameter vectors. Vectors      layout        >       voice_recognition.xml,      and
of speech characteristics consist mostly of MFC         findViewById(int) to programmatically interact
(Mel Frequency Cepstral) coefficients [4],              with widgets in the user interface. In this method
standardized by the European Telecommunications         there is also a check whether mobile phone, on
Standards Institute for speech recognition. The         which application is installed, has speech
European Telecommunications Standards Institute         recognition possibility. Package Manager is class for
in the early 2000s defined a standardized MFCC          retrieving various kinds of information related to the
algorithm to be used in mobile phones [5]. Standard     application packages that are currently installed on
MFC coefficients are constructed in a few simple        the device. FunctiongetPackageManager() returns
steps. A short-time Fourier analysis of the speech      PackageManager instance to find global package
signal using a finite-duration window (typically        information. Using this class, we can detect if the
20ms) is performed and the power spectrum is            phone has a possibility for speech recognition. If a
computed. Then, variable bandwidth triangular           mobile device doesn’t have one of many Google’s
filters are placed along the perceptually motivated     applications which integrate speech recognition,
mel frequency scale and filter bank energies are        further work of this application Voice SMS will be
calculated from the power spectrum.                     disabled and message on the screen will be
          Magnitude compression is employed using       “Recognizer not present”. Recognition process is
the logarithmic function. Finally, auditory spectrum    done trough one of Google’s speech recognition
thus obtained is decorrelated using the DCT and         applications. If recognition activity is present user
first (typically 13) coefficients represent the         can start the speech recognition by pressing on the
MFCCs.                                                  button and thus launching startActivityForResult
          These vectors of speech characteristics are   (Intent intent, int requestCode). The application
called observations and used in further calculations.   uses startActivityForResult() to broadcast an intent
To develop an acoustic model, it is necessary to        that requests voice recognition, including an extra
define states. Continuous speech recognition, each      parameter that specifies one of two language
state represents one phoneme. Under the concept of      models. Intent is defined with intent.putExtra
training we mean the determination of probabilities     (RecognizerIntent.
of transition from one state to another and
probabilities of observations. Iterative Baum-Welch     EXTRA_LANGUAGE_MODEL,
procedure is used for training [6]. The process is      RecognizerIntent.LANGUAGE_MODEL_FREE_F
repeated until a certain convergence criterion is       ORM).
reached, for example good accuracy in terms of                    The voice recognition application that
small changes of estimated parameters, in two           handles the intent processes the voice input, then
successive iterations. In continuous speech the         passes the recognized string back to Voice SMS
procedure is performed for each word in complex         application by calling the onActivityResult()
HM model. Once states, observations and transition      callback. In this method we manage result of
matrix for HMM are defined, the decoding (or            activity startActivityForResult. Received result
recognition) can be performed. Decoding represents      Data is stored in ListView variable and shown on
finding of most likely sequence of hidden states        the screen (Fig. 3.).
using Viterbi algorithm, according to the observed



                                                                                              255 | P a g e
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
                 Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 3, Issue 1, January -February 2013, pp.253-258
                                                      the data associated with the string "key" which in
                                                      this case is selected text from recognition Fig3.
                                                      Testing Figure 4. Interface for sending SMS
                                                      activity. The text is entered in the space for writing
                                                      messages and displayed on the screen. By clicking
                                                      the Send SMS button application checks whether the
                                                      message and the number of recipient are entered to
                                                      perform sending of message. When cursor is
                                                      positioned in the space for recipient number from
                                                      contacts, button attribute visibility is changed from
                                                      default gone to visible. Pressing the button the
                                                      command that allows you to enter the contact
                                                      numbers. label Intent forContacts new Intent =
                                                      (Intent.ACTION_PICK,
Fig2: Enables search after clicking image button      ContactsContract.Contacts.CONTENT_URI)
                                                      startActivityForResult                    (forContacts,
                                                      REQUEST_CODE)
                                                      is executed. After selecting desired contact, message
                                                      can be sent.

                                                      C. XML files
                                                                 Application consists of two different
                                                      interfaces. When the user runs application screen is
                                                      defined in voice_recognition.xml. The linear
                                                      arrangement of elements allows adding widget one
                                                      below another. Width and height are defined with
                                                      fill_parent attribute, which means to be equal as
                                                      parent (in this case the screen). The second
                                                      interface, defined within sms.xml file, is displayed
                                                      when the user chooses one of offered messages.
Fig3: Processes and gives text output                 AndroidManifest.xml realizes installing and
                                                      launching applications on the mobile device. Every
                                                      application must have an AndroidManifest.xml file
  Main Activity                                       (with precisely that name) in its root directory. The
                                                      manifest presents essential information about the
                                                      application to the Android system, information the
                                                      system must have before it can run any of the
                                                      application's code. It defines the activities and
                                                      permissions required for the application.
                                                      Permissions are used to remove restrictions that
                                                      prevent access to certain pieces of code or data on
                                                      mobile device. Each permit is defined by a special
                                                      label. If application needs data with distrait if must
                                                      seek the necessary authorization. The syntax is:
                                                      <uses-permission android:name= "Unique name of
                                                      permission"> </ uses-permission>.
                                                                 Every activity must be named in manifest.
                                                      Name is used as a parameter that is passed to the
                                                      constructor of intent which is used to start desired
                                                      activity. Activities can have different <category>
                                                      attribute.
                                                                 VoiceRecognitionActivity.java is the main
                                                      class and its onCreate method is executed at
Fig4: Interface for sending SMS                       application startup. The category tag is specially
                                                      defined String that describes this feature of activity
B. SMS class                                          with keyword Launcher while other classes are
In onCreate method label setContentView               marked           with        keyword          Default.
(R.layout.sms) opens new user interface. Command      AndroidManifest.xml has to include permission to
final          String           message1          =   send and receive messages, to access Internet and
getIntent().getExtras().GetString ("Key") retrieves



                                                                                             256 | P a g e
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
                  Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 3, Issue 1, January -February 2013, pp.253-258
permission to access the list of contacts from the         and become more accessible to everyone. We have
phonebook.                                                 been given a possibility to manage mobile devices
Permissions are:                                           without installing complex software for speech
_ android.permission.INTERNET.                             processing, resulting in memory savings. A lack of
_ android.permission.SEND_SMS.                             application Voice SMS is its adaption only for
_ android.permission.RECEIVE_SMS.                          English language, and the need for permanent
_ android.permission.READ_CONTACTS.                        Internet connection.
                                                           The objective for the future is development of
                                                           models and databases for multiple languages which
V. APPLICATION                 FUNCTIONALITY               could create a foundation for everyday use of this
PRINCIPLE                                                  technology worldwide. The main goal of application
           Application Voice SMS integrates direct         is to allow sending of text messages based on
speech input enabling user to record spoken                uttered voice messages. Testing has shown
information as text message, and send it as SMS            simplicity of use and high accuracy of data
message. After application has been started display        processing. Results of recognition give user option
on mobile phone shows button which initiate voice          to select the most accurate result. Further work is
recognition process. When speech has been detected         planned to implement the model of speech
application opens connection with Google’s server          recognition for different language.
and starts to communicate with it by sending blocks
of speech signal. Simultaneously the figure of             LITERATURE
waveform is generated on the screen. Speech                  [1]  “Android                       developers”,
recognition of the received signal is preformed on               http://developer.android.com
server. Google has accumulated a very large                  [2]  J. Tebelskis, Speech Recognition using
database of words derived from the daily entries in              Neural      Networks, Pittsburgh: School of
the Google search engines well as the digitalization             Computer        Science,Carnegie      Mellon
of more than 10 million books in Google Book                     University, 1995.
Search project. The database contains more than 230          [3]  S. J. Young et al., "The HTK Book",
billion words. If we use this kind of speech                     Cambridge         University     Engineering
recognizer it is very likely that our voice is stored on         Department, 2006.
Google’s servers. This fact provides continuous              [4] K.     Davis,      and    Mermelstein,     P.,
increase of data used for training, thus improving               "Comparison of parametric representation
accuracy of the system.                                          for monosyllabic word recognition in
           When process of recognition is over, user             continuously spoken sentences", IEEE
can see the list of possible statements. Process can             Trans. Acoust., Speech, Signal Process.
be repeated clicking on the button Image Button.                 vol. 28, no. 4, pp. 357–366,1980.
Pressing the most accurate option, selected result is        [5]  ETSI, "Speech Processing, Transmission
entered into interface for writing SMS messages.                 and Quality Aspects (STQ); Distributed
           Interface for writing SMS messages has all            speech recognition; Front-end feature
standard features. User can correct text and input               extraction       algorithm;     Compression
recipient number in the empty textbox. Button                    algorithms.", Technical standard ES 201
Contacts opens interface with contact numbers from               108, v1.1.3, 2003.
mobile phone and enables user to chose phone                 [6]  L. E. Baum, T. Petrie, G. Soules, and N.
number on which message will be sent after                       Weiss, "A maximization technique
pressing button Send. Customer receives response in              occurring in the statistical analysis of
a little cloud (toast) if message has been sent.                 probabilistic functions of Markov chains",
                                                                 Ann. Math. Statist., vol. 41, no. 1, pp. 164–
VI. CONCLUSION                                                   171, 1970.
          With the development of software and
hardware capabilities of mobile devices, there is an       REFERENCES
increased need for device-specific content, what             [1]    C. Allauzen, M. Riley, J. Schalkwyk, W.
resulted in market changes. Speech recognition                     Skut,          and M. Mohri. OpenFst: A
technology is of particular interest due to the direct             general and efficient weighted _nite-state
support of communications between human and                        transducer library. Lecture Notes in
computers.                                                         Computer Science, 4783:11, 2007.
          Using the speech recognizer, which works           [2]   M. Bacchiani, F. Beaufays, J. Schalkwyk,
over the Internet, allows much faster data                         M. Schuster, and B. Strope. Deploying
processing. Another advantage is the much larger                   GOOG-411: Early lessons in data,
databases that are used. Many believe that this mode               measurement, and testing. In Proceedings
is a big step for speech recognition technology. The               of ICASSP, pages 5260{5263, 2008.
accuracy of the system has significantly increased


                                                                                                257 | P a g e
B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and
               Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                 Vol. 3, Issue 1, January -February 2013, pp.253-258
[3]    MJF Gales. Semi-tied full-covariance
       matrices for hidden Markov models. 1997.
[4]    B. Harb, C. Chelba, J. Dean, and G.
       Ghemawhat. Back-o_ Language Model
       Compression. 2009.
[5]    H. Hermansky. Perceptual linear predictive
       (PLP) analysis of speech. Journal of the
       Acoustical      Society     of    America,
       87(4):1738{1752, 1990.
[6]    Maryam Kamvar and Shumeet Baluja. A
       large scale study of wireless search
       behavior: Google mobile search. In CHI,
       pages 701{709, 2006.
[7]    S. Katz. Estimation of probabilities from
       sparse data for the language model
       component of a speech recognizer. In IEEE
       Transactions on Acoustics, Speech and
       Signal Processing, volume 35, pages
       400{01, March 1987.
[8]    D. Povey, D. Kanevsky, B. Kingsbury, B.
       Ramabhadran,       G.    Saon,   and    K.
       Visweswariah. Boosted MMI for model
       and feature-space discriminative training.
       In Proc. of the Int. Conf. on Acoustics,
       Speech, and Signal Processing (ICASSP),
       2008.
[9]    R. Sproat, C. Shih, W. Gale, and N. Chang.
       A stochastic _nite-state word-segmentation
       algorithm for Chinese. Computational
       linguistics,22(3):377{404, 1996.
[10]   C. Van Heerden, J. Schalkwyk, and B.
       Strope. Language modeling for what-with-
       where on GOOG-411. 2009.

  WEB SITES
       Minimalist       GNU      for     Windows,
       06/01/2010,
       http://www.mingw.org/
       Code Blocks, The open source, cross
       platform, IDE, 06/01/2010,
       http://www.codeblocks.org/
       Java JDK SE 1.6 update 18, 02/04/2010,
       http://java.sun.com/javase/
       ECLIPSE GALILEO Release 3.5.2,
       02/04/2010,
       http://www.eclipse.org/galileo/
       Java      sound     API      documentation,
       11/04/2010,
       http://java.sun.com/j2se/1.5.0/docs/guide/s
       ound/programmer_guide/contents.html
       Android 2.2 SDK release 6,
       http://developer.android.com/sdk/index.ht
       ml
       Android 2.2 SDK reference page,
       http://developer.android.com/reference/pac
       kages.html




                                                                           258 | P a g e

Mais conteúdo relacionado

Mais procurados

Srand011 personal addressbook
Srand011 personal addressbookSrand011 personal addressbook
Srand011 personal addressbookAndroidproject
 
Isha_chaoji_3_plus_iOS
Isha_chaoji_3_plus_iOSIsha_chaoji_3_plus_iOS
Isha_chaoji_3_plus_iOSIsha Chaoji
 
11.universal mobile application development (umad) on home automation
11.universal mobile application development (umad) on home automation11.universal mobile application development (umad) on home automation
11.universal mobile application development (umad) on home automationAlexander Decker
 
Android Application Development
Android Application DevelopmentAndroid Application Development
Android Application DevelopmentGokhan Arik
 
android app development training report
android app development training reportandroid app development training report
android app development training reportRishita Jaggi
 
Android File Manager Report PDF
Android File Manager Report PDFAndroid File Manager Report PDF
Android File Manager Report PDFPrajjwal Kumar
 
Html5 overview
Html5 overviewHtml5 overview
Html5 overviewappbackr
 
IRJET - Android based Mobile Forensic and Comparison using Various Tools
IRJET -  	  Android based Mobile Forensic and Comparison using Various ToolsIRJET -  	  Android based Mobile Forensic and Comparison using Various Tools
IRJET - Android based Mobile Forensic and Comparison using Various ToolsIRJET Journal
 
Android Application on Location sharing and message sender
Android Application on Location sharing and message senderAndroid Application on Location sharing and message sender
Android Application on Location sharing and message senderKavita Sharma
 
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010www.webhub.mobi by Yuvee, Inc.
 
Android summer training report
Android summer training reportAndroid summer training report
Android summer training reportShashendra Singh
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbotsvivatechijri
 

Mais procurados (19)

Android Apps
Android AppsAndroid Apps
Android Apps
 
Srand011 personal addressbook
Srand011 personal addressbookSrand011 personal addressbook
Srand011 personal addressbook
 
Isha_chaoji_3_plus_iOS
Isha_chaoji_3_plus_iOSIsha_chaoji_3_plus_iOS
Isha_chaoji_3_plus_iOS
 
11.universal mobile application development (umad) on home automation
11.universal mobile application development (umad) on home automation11.universal mobile application development (umad) on home automation
11.universal mobile application development (umad) on home automation
 
Android Application Development
Android Application DevelopmentAndroid Application Development
Android Application Development
 
Windows phone
Windows phoneWindows phone
Windows phone
 
Android
AndroidAndroid
Android
 
android app development training report
android app development training reportandroid app development training report
android app development training report
 
ResumeDinakaran
ResumeDinakaranResumeDinakaran
ResumeDinakaran
 
Android File Manager Report PDF
Android File Manager Report PDFAndroid File Manager Report PDF
Android File Manager Report PDF
 
Html5 overview
Html5 overviewHtml5 overview
Html5 overview
 
Sagar Aggarwal_1
Sagar Aggarwal_1Sagar Aggarwal_1
Sagar Aggarwal_1
 
IRJET - Android based Mobile Forensic and Comparison using Various Tools
IRJET -  	  Android based Mobile Forensic and Comparison using Various ToolsIRJET -  	  Android based Mobile Forensic and Comparison using Various Tools
IRJET - Android based Mobile Forensic and Comparison using Various Tools
 
Android Application on Location sharing and message sender
Android Application on Location sharing and message senderAndroid Application on Location sharing and message sender
Android Application on Location sharing and message sender
 
Android OS
Android OSAndroid OS
Android OS
 
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010
Richness + Simplicity: The Holy Grail Of Mobile UI - 1.25.2010
 
Android summer training report
Android summer training reportAndroid summer training report
Android summer training report
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbots
 
Supratik_CV_Photo
Supratik_CV_PhotoSupratik_CV_Photo
Supratik_CV_Photo
 

Destaque

Menúmediterràneo
MenúmediterràneoMenúmediterràneo
Menúmediterràneoadrianavas
 
A temporada de criação começou
A temporada de criação começouA temporada de criação começou
A temporada de criação começouAlcon Pet
 
Plano de patrocínio BusTV 2012 Dia da Mulher
Plano de patrocínio BusTV 2012   Dia da MulherPlano de patrocínio BusTV 2012   Dia da Mulher
Plano de patrocínio BusTV 2012 Dia da MulherCanal BUSTV
 
Curriculum largo 2008
Curriculum largo 2008Curriculum largo 2008
Curriculum largo 2008denegocio
 
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio Maior
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio MaiorAnálise Espacial dos Acidentes Rodoviários no Concelho de Rio Maior
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio MaiorTiago Santos
 
Introducción a la CMNUCC
Introducción a la CMNUCCIntroducción a la CMNUCC
Introducción a la CMNUCCCO2.cr
 
Generación net
Generación netGeneración net
Generación net210404
 
Presentación1
Presentación1Presentación1
Presentación1Kelly Soto
 
Daily forex-report by epic research 21 feb 2013
Daily forex-report by epic research 21 feb 2013Daily forex-report by epic research 21 feb 2013
Daily forex-report by epic research 21 feb 2013Epic Daily Report
 
Propuesta de planificación
Propuesta de  planificaciónPropuesta de  planificación
Propuesta de planificacióngeesvava
 
Glosario
GlosarioGlosario
GlosarioShainne
 
Plano de patrocínio BusTV 2012 - Dia dos Namorados
Plano de patrocínio BusTV 2012 - Dia dos NamoradosPlano de patrocínio BusTV 2012 - Dia dos Namorados
Plano de patrocínio BusTV 2012 - Dia dos NamoradosCanal BUSTV
 
Brico reparar un radiocd
Brico reparar un radiocdBrico reparar un radiocd
Brico reparar un radiocdlalovillarejo
 

Destaque (20)

Am31268273
Am31268273Am31268273
Am31268273
 
V25112115
V25112115V25112115
V25112115
 
Q25085089
Q25085089Q25085089
Q25085089
 
Nb2522322236
Nb2522322236Nb2522322236
Nb2522322236
 
Menúmediterràneo
MenúmediterràneoMenúmediterràneo
Menúmediterràneo
 
A temporada de criação começou
A temporada de criação começouA temporada de criação começou
A temporada de criação começou
 
Plano de patrocínio BusTV 2012 Dia da Mulher
Plano de patrocínio BusTV 2012   Dia da MulherPlano de patrocínio BusTV 2012   Dia da Mulher
Plano de patrocínio BusTV 2012 Dia da Mulher
 
Curriculum largo 2008
Curriculum largo 2008Curriculum largo 2008
Curriculum largo 2008
 
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio Maior
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio MaiorAnálise Espacial dos Acidentes Rodoviários no Concelho de Rio Maior
Análise Espacial dos Acidentes Rodoviários no Concelho de Rio Maior
 
Introducción a la CMNUCC
Introducción a la CMNUCCIntroducción a la CMNUCC
Introducción a la CMNUCC
 
Generación net
Generación netGeneración net
Generación net
 
Presentación1
Presentación1Presentación1
Presentación1
 
Daily forex-report by epic research 21 feb 2013
Daily forex-report by epic research 21 feb 2013Daily forex-report by epic research 21 feb 2013
Daily forex-report by epic research 21 feb 2013
 
Propuesta de planificación
Propuesta de  planificaciónPropuesta de  planificación
Propuesta de planificación
 
Hardware
HardwareHardware
Hardware
 
Estrategiaempresarial bl
Estrategiaempresarial blEstrategiaempresarial bl
Estrategiaempresarial bl
 
Glosario
GlosarioGlosario
Glosario
 
Portafolio 1
Portafolio 1Portafolio 1
Portafolio 1
 
Plano de patrocínio BusTV 2012 - Dia dos Namorados
Plano de patrocínio BusTV 2012 - Dia dos NamoradosPlano de patrocínio BusTV 2012 - Dia dos Namorados
Plano de patrocínio BusTV 2012 - Dia dos Namorados
 
Brico reparar un radiocd
Brico reparar un radiocdBrico reparar un radiocd
Brico reparar un radiocd
 

Semelhante a Aj31253258

Know all about android development
Know all about android developmentKnow all about android development
Know all about android developmentDeepika Chaudhary
 
Android by Ravindra J.Mandale
Android by Ravindra J.MandaleAndroid by Ravindra J.Mandale
Android by Ravindra J.MandaleRavindra Mandale
 
Garbage Management using Android Smartphone
Garbage Management using Android SmartphoneGarbage Management using Android Smartphone
Garbage Management using Android Smartphoneijsrd.com
 
Voice Command Mobile Phone Dialer
Voice Command Mobile Phone DialerVoice Command Mobile Phone Dialer
Voice Command Mobile Phone Dialerijtsrd
 
2011 Artezio Mobile
2011 Artezio Mobile2011 Artezio Mobile
2011 Artezio Mobilepolatsidis
 
Mobile camera based text detection and translation
Mobile camera based text detection and translationMobile camera based text detection and translation
Mobile camera based text detection and translationVivek Bharadwaj
 
LANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPLANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPIRJET Journal
 
Mobile Application Development with Android
Mobile Application Development with AndroidMobile Application Development with Android
Mobile Application Development with AndroidIJAAS Team
 
Hindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftHindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftIAEME Publication
 
Ch1 hello, android
Ch1 hello, androidCh1 hello, android
Ch1 hello, androidJehad2012
 
Accident detection
Accident detection Accident detection
Accident detection Samana Rao
 
Thorsignia - Custom software development services in india
Thorsignia - Custom software development services in indiaThorsignia - Custom software development services in india
Thorsignia - Custom software development services in indiacharan Teja
 
Alumni-Student Interactive Messaging
Alumni-Student Interactive MessagingAlumni-Student Interactive Messaging
Alumni-Student Interactive MessagingIRJET Journal
 
Voice Controlled News Web Based Application With Speech Recognition Using Ala...
Voice Controlled News Web Based Application With Speech Recognition Using Ala...Voice Controlled News Web Based Application With Speech Recognition Using Ala...
Voice Controlled News Web Based Application With Speech Recognition Using Ala...IRJET Journal
 

Semelhante a Aj31253258 (20)

Know all about android development
Know all about android developmentKnow all about android development
Know all about android development
 
Android by Ravindra J.Mandale
Android by Ravindra J.MandaleAndroid by Ravindra J.Mandale
Android by Ravindra J.Mandale
 
Garbage Management using Android Smartphone
Garbage Management using Android SmartphoneGarbage Management using Android Smartphone
Garbage Management using Android Smartphone
 
Android Introduction by Kajal
Android Introduction by KajalAndroid Introduction by Kajal
Android Introduction by Kajal
 
Voice Command Mobile Phone Dialer
Voice Command Mobile Phone DialerVoice Command Mobile Phone Dialer
Voice Command Mobile Phone Dialer
 
2011 Artezio Mobile
2011 Artezio Mobile2011 Artezio Mobile
2011 Artezio Mobile
 
Mobile camera based text detection and translation
Mobile camera based text detection and translationMobile camera based text detection and translation
Mobile camera based text detection and translation
 
LANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APPLANGUAGE TRANSLATOR APP
LANGUAGE TRANSLATOR APP
 
D1803041822
D1803041822D1803041822
D1803041822
 
Android architecture
Android architectureAndroid architecture
Android architecture
 
Hemanth_CV
Hemanth_CVHemanth_CV
Hemanth_CV
 
CV_Prathap (1)
CV_Prathap (1)CV_Prathap (1)
CV_Prathap (1)
 
Mobile Application Development with Android
Mobile Application Development with AndroidMobile Application Development with Android
Mobile Application Development with Android
 
Android
AndroidAndroid
Android
 
Hindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoftHindi speech enabled windows application using microsoft
Hindi speech enabled windows application using microsoft
 
Ch1 hello, android
Ch1 hello, androidCh1 hello, android
Ch1 hello, android
 
Accident detection
Accident detection Accident detection
Accident detection
 
Thorsignia - Custom software development services in india
Thorsignia - Custom software development services in indiaThorsignia - Custom software development services in india
Thorsignia - Custom software development services in india
 
Alumni-Student Interactive Messaging
Alumni-Student Interactive MessagingAlumni-Student Interactive Messaging
Alumni-Student Interactive Messaging
 
Voice Controlled News Web Based Application With Speech Recognition Using Ala...
Voice Controlled News Web Based Application With Speech Recognition Using Ala...Voice Controlled News Web Based Application With Speech Recognition Using Ala...
Voice Controlled News Web Based Application With Speech Recognition Using Ala...
 

Aj31253258

  • 1. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 Speech to Text Conversion using Android Platform B. Raghavendhar Reddy#1, E. Mahender *2 #1 Department of Electronics Communication and Engineering Aurora’s Technological and Research Institute Parvathapur, Uppal, Hyderabad, India. 2Asst. Professor Aurora’s Technological and Research Institute Parvathapur, Uppal, Hyderabad, India. Abstract For the past several decades, designers have remains speech. Market for smart mobile phones processed speech for a wide variety of provides a number of applications with speech applications ranging from mobile recognition implementation. Google's Voice Actions communications to automatic reading machines. and recently iphone's Siri are applications that Speech recognition reduces the overhead caused enable control of a mobile phone using voice, such by alternate communication methods. Speech has as calling businesses and contacts, sending texts and not been used much in the field of electronics and email, listening to music, browsing the web, and computers due to the complexity and variety of completing common tasks. Both Siri and Voice speech signals and sounds. However, with Actions require an active connection to a network in modern processes, algorithms, and methods we order to process requests and most of Android can process speech signals easily and recognize phones can run on a 4G network which is faster than the text. In this project, we are going to develop the 3G network that the iPhone runs on. There is an on-line speech-to-text engine. The system also an issue of availability, Voice Actions are acquires speech at run time through a available on all Android devices above Android 2.2, microphone and processes the sampled speech to but Siri is available only for owners of the iPhone recognize the uttered text. The recognized text 4S. The Siri's advantage is that it can act on a wide can be stored in a file. We are developing this on variety of phrases and requests and can understand android platform using eclipse workbench. Our and learn from natural language, whereas Google's speech-to-text system directly acquires and Voice Actions can be operated only by using very converts speech to text. It can supplement other specific voice commands. In this work we have larger systems, giving users a different choice for developed an application for sending SMS messages data entry. A speech-to-text system can also which uses Google's speech recognition engine. The improve system accessibility by providing data main goal of application Voice SMS is to allow user entry options for blind, deaf, or physically to input spoken information and send voice message handicapped users. Voice SMS is an application as desired text message. The user is able to developed in this work that allows a user to manipulate text message fast and easy without using record and convert spoken messages into SMS keyboard, reducing spent time and effort. In this text message. User can send messages to the case speech recognition provides alternative to entered phone number. Speech recognition is standard use of key board for text input, creating done via the Internet, connecting to Google's another dimension in modern communications. server. The application is adapted to input messages in English. Speech recognition for II. ANDROID Voice uses a technique based on hidden Markov Android is a software environment for models (HMM - Hidden Markov Model). It is mobile devices that includes an operating system, currently the most successful and most flexible middleware and key applications [1]. In 2005 approach to speech recognition. Google took over company Android Inc., and two years later, in collaboration with the group the Open Keywords: HMM, Android Os, DVM, Speech Handset Alliance, presented Android operating Recognition, Intents system (OS). Main features of Android operating system are: I. INTRODUCTION  Enables free download of development Mobile phones have become an integral environment for application development. part of our Everyday life, causing higher demands  Free use and adaptation of operating for content that can be used on them. Smart phones system to manufacturers of mobile offer customer enhanced methods to interact with devices. their phones but the most natural way of interaction 253 | P a g e
  • 2. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258  Equality of basic core applications and construction of applications: activity, intent, service additional applications in access to and the content provider. An activity is the main resources. element of every application and simplified  Optimized use of memory and automatic description defines it as a window that users see on control of applications which are being their mobile device. The application can have one or executed. more activities. Main activity is the one that is used  Quick and easy development of as startup. The transition between the activities is applications using development tools and carried out in a way that launched activity calls a rich database of software libraries. new activity. Each activity as a separate component  High quality of audiovisual content, it is is implemented with inheritance of Activity class. possible to use vector graphics, and most During the execution of applications, activities are audio and video formats. added to the stack, currently running activity is on  Ability to test applications on most the top of the stack. computing platforms, including Windows, An intent is a message used to run the Linux… activities, services, or recipient’s multicast. An The Android operating system (OS) intent can contain the name of the components you architecture is divided into 5 layers (fig. 1.). The need to run, the action which is necessary to application layer of Android OS is visible to end execute, the address of stored data needed to run the user, and consists of user applications. The component, and component type. A service is a application layer includes basic applications which component that runs in the background to perform come with the operating system and applications long running operations or to perform work for which user subsequently takes. All applications are remote processes. One service can link multiple written in the Java programming language. applications and service is executed until a Framework is extensible set of software components connection with all applications is done. A content used by all applications in the operating system. The provider manages a shared set of application data. next layer represents the libraries, written in the C Data can be stored in the file system, a SQLite and C + + programming languages, and OS accesses database, on the web, or any other persistent storage them via framework. location which application can access [1]. Dalvik Virtual Machine (DVM), forms the main Through the content provider, other applications can part of the executive system environment. Virtual query or even modify the data (if the content machine is used to start the core libraries written in provider allows it). the Java III. SPEECH RECOGNITION Speech recognition for application Voice SMS is done on Google server, using the HMM algorithm. HMM algorithm is briefly described in this part. Process involves the conversion of acoustic speech into a set of words and is performed by software component. Accuracy of speech recognition systems differ in vocabulary size and confusability, speaker dependence vs. independence, modality of speech (isolated, discontinuous, or continuous speech, read or spontaneous speech), task and language constraints [2]. Speech recognition system can be divided into several blocks: feature extraction, acoustic models database which is built based on the training data, dictionary, language model and the speech recognition algorithm. Analog speech signal must Fig1: Android Architecture first be sampled on time and amplitude axes, or programming language. Unlike Java’s digitized. Samples of speech signal are analyzed in virtual machine, which is based on the stack, DVM even intervals. This period is usually 20 ms because bases on registry structure and it is intended for signal in this interval is considered stationary. mobile devices. The last architecture layer of Speech feature extraction involves the formation of Android operating system is kernel based on Linux equally spaced discrete vectors of speech OS, which serves as a hardware abstraction layer. characteristics. Feature vectors from training The main reasons for its use are memory database are used to estimate the parameters of management and processes, security model, network acoustic models. Acoustic model describes system and the constant development of systems. properties of the basic elements that can be There are four basic components used in recognized. The basic element can be a phoneme for 254 | P a g e
  • 3. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 continuous speech or word for isolated words output sequence. It is defined by recursive relation. recognition. During the search, n-best word sequences are Dictionary is used to connect acoustic generated using acoustic models and a language models with vocabulary words. Language model model. reduces the number of acceptable word combinations based on the rules of language and IV. MAIN PARTS OF THE PROJECT statistical information from different texts. Speech A. Voice Recognition Activity class recognition systems, based on hidden Markov Voice Recognition Activity is startup models are today most widely applied in modern activity defined as launcher in AndroidManifest.xml technologies. They use the word or phoneme as a file. REQUEST_CODE is static integer variable, unit for modeling. The model output is hidden declared on the beginning of activity and used to probabilistic functions of state and can’t be confirm response when engine for speech deterministically specified. State sequence through recognition is started. REQUEST_CODE has model is not exactly known. Speech recognition positive value. Results of recognition are saved in systems generally assume that the speech signal is a variable declared as ListView type. realization of some message encoded as a sequence Method onCreate is called when activity is of one or more symbols [3]. To effect the reverse initiated. operation of recognizing the underlying symbol This is where the most initialization goes: sequence given a spoken utterance, the continuous setContentView (R.layout.voice_recognition) is speech waveform is first converted to a sequence of used to inflate the user interface defined in res > equally spaced discrete parameter vectors. Vectors layout > voice_recognition.xml, and of speech characteristics consist mostly of MFC findViewById(int) to programmatically interact (Mel Frequency Cepstral) coefficients [4], with widgets in the user interface. In this method standardized by the European Telecommunications there is also a check whether mobile phone, on Standards Institute for speech recognition. The which application is installed, has speech European Telecommunications Standards Institute recognition possibility. Package Manager is class for in the early 2000s defined a standardized MFCC retrieving various kinds of information related to the algorithm to be used in mobile phones [5]. Standard application packages that are currently installed on MFC coefficients are constructed in a few simple the device. FunctiongetPackageManager() returns steps. A short-time Fourier analysis of the speech PackageManager instance to find global package signal using a finite-duration window (typically information. Using this class, we can detect if the 20ms) is performed and the power spectrum is phone has a possibility for speech recognition. If a computed. Then, variable bandwidth triangular mobile device doesn’t have one of many Google’s filters are placed along the perceptually motivated applications which integrate speech recognition, mel frequency scale and filter bank energies are further work of this application Voice SMS will be calculated from the power spectrum. disabled and message on the screen will be Magnitude compression is employed using “Recognizer not present”. Recognition process is the logarithmic function. Finally, auditory spectrum done trough one of Google’s speech recognition thus obtained is decorrelated using the DCT and applications. If recognition activity is present user first (typically 13) coefficients represent the can start the speech recognition by pressing on the MFCCs. button and thus launching startActivityForResult These vectors of speech characteristics are (Intent intent, int requestCode). The application called observations and used in further calculations. uses startActivityForResult() to broadcast an intent To develop an acoustic model, it is necessary to that requests voice recognition, including an extra define states. Continuous speech recognition, each parameter that specifies one of two language state represents one phoneme. Under the concept of models. Intent is defined with intent.putExtra training we mean the determination of probabilities (RecognizerIntent. of transition from one state to another and probabilities of observations. Iterative Baum-Welch EXTRA_LANGUAGE_MODEL, procedure is used for training [6]. The process is RecognizerIntent.LANGUAGE_MODEL_FREE_F repeated until a certain convergence criterion is ORM). reached, for example good accuracy in terms of The voice recognition application that small changes of estimated parameters, in two handles the intent processes the voice input, then successive iterations. In continuous speech the passes the recognized string back to Voice SMS procedure is performed for each word in complex application by calling the onActivityResult() HM model. Once states, observations and transition callback. In this method we manage result of matrix for HMM are defined, the decoding (or activity startActivityForResult. Received result recognition) can be performed. Decoding represents Data is stored in ListView variable and shown on finding of most likely sequence of hidden states the screen (Fig. 3.). using Viterbi algorithm, according to the observed 255 | P a g e
  • 4. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 the data associated with the string "key" which in this case is selected text from recognition Fig3. Testing Figure 4. Interface for sending SMS activity. The text is entered in the space for writing messages and displayed on the screen. By clicking the Send SMS button application checks whether the message and the number of recipient are entered to perform sending of message. When cursor is positioned in the space for recipient number from contacts, button attribute visibility is changed from default gone to visible. Pressing the button the command that allows you to enter the contact numbers. label Intent forContacts new Intent = (Intent.ACTION_PICK, Fig2: Enables search after clicking image button ContactsContract.Contacts.CONTENT_URI) startActivityForResult (forContacts, REQUEST_CODE) is executed. After selecting desired contact, message can be sent. C. XML files Application consists of two different interfaces. When the user runs application screen is defined in voice_recognition.xml. The linear arrangement of elements allows adding widget one below another. Width and height are defined with fill_parent attribute, which means to be equal as parent (in this case the screen). The second interface, defined within sms.xml file, is displayed when the user chooses one of offered messages. Fig3: Processes and gives text output AndroidManifest.xml realizes installing and launching applications on the mobile device. Every application must have an AndroidManifest.xml file Main Activity (with precisely that name) in its root directory. The manifest presents essential information about the application to the Android system, information the system must have before it can run any of the application's code. It defines the activities and permissions required for the application. Permissions are used to remove restrictions that prevent access to certain pieces of code or data on mobile device. Each permit is defined by a special label. If application needs data with distrait if must seek the necessary authorization. The syntax is: <uses-permission android:name= "Unique name of permission"> </ uses-permission>. Every activity must be named in manifest. Name is used as a parameter that is passed to the constructor of intent which is used to start desired activity. Activities can have different <category> attribute. VoiceRecognitionActivity.java is the main class and its onCreate method is executed at Fig4: Interface for sending SMS application startup. The category tag is specially defined String that describes this feature of activity B. SMS class with keyword Launcher while other classes are In onCreate method label setContentView marked with keyword Default. (R.layout.sms) opens new user interface. Command AndroidManifest.xml has to include permission to final String message1 = send and receive messages, to access Internet and getIntent().getExtras().GetString ("Key") retrieves 256 | P a g e
  • 5. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 permission to access the list of contacts from the and become more accessible to everyone. We have phonebook. been given a possibility to manage mobile devices Permissions are: without installing complex software for speech _ android.permission.INTERNET. processing, resulting in memory savings. A lack of _ android.permission.SEND_SMS. application Voice SMS is its adaption only for _ android.permission.RECEIVE_SMS. English language, and the need for permanent _ android.permission.READ_CONTACTS. Internet connection. The objective for the future is development of models and databases for multiple languages which V. APPLICATION FUNCTIONALITY could create a foundation for everyday use of this PRINCIPLE technology worldwide. The main goal of application Application Voice SMS integrates direct is to allow sending of text messages based on speech input enabling user to record spoken uttered voice messages. Testing has shown information as text message, and send it as SMS simplicity of use and high accuracy of data message. After application has been started display processing. Results of recognition give user option on mobile phone shows button which initiate voice to select the most accurate result. Further work is recognition process. When speech has been detected planned to implement the model of speech application opens connection with Google’s server recognition for different language. and starts to communicate with it by sending blocks of speech signal. Simultaneously the figure of LITERATURE waveform is generated on the screen. Speech [1] “Android developers”, recognition of the received signal is preformed on http://developer.android.com server. Google has accumulated a very large [2] J. Tebelskis, Speech Recognition using database of words derived from the daily entries in Neural Networks, Pittsburgh: School of the Google search engines well as the digitalization Computer Science,Carnegie Mellon of more than 10 million books in Google Book University, 1995. Search project. The database contains more than 230 [3] S. J. Young et al., "The HTK Book", billion words. If we use this kind of speech Cambridge University Engineering recognizer it is very likely that our voice is stored on Department, 2006. Google’s servers. This fact provides continuous [4] K. Davis, and Mermelstein, P., increase of data used for training, thus improving "Comparison of parametric representation accuracy of the system. for monosyllabic word recognition in When process of recognition is over, user continuously spoken sentences", IEEE can see the list of possible statements. Process can Trans. Acoust., Speech, Signal Process. be repeated clicking on the button Image Button. vol. 28, no. 4, pp. 357–366,1980. Pressing the most accurate option, selected result is [5] ETSI, "Speech Processing, Transmission entered into interface for writing SMS messages. and Quality Aspects (STQ); Distributed Interface for writing SMS messages has all speech recognition; Front-end feature standard features. User can correct text and input extraction algorithm; Compression recipient number in the empty textbox. Button algorithms.", Technical standard ES 201 Contacts opens interface with contact numbers from 108, v1.1.3, 2003. mobile phone and enables user to chose phone [6] L. E. Baum, T. Petrie, G. Soules, and N. number on which message will be sent after Weiss, "A maximization technique pressing button Send. Customer receives response in occurring in the statistical analysis of a little cloud (toast) if message has been sent. probabilistic functions of Markov chains", Ann. Math. Statist., vol. 41, no. 1, pp. 164– VI. CONCLUSION 171, 1970. With the development of software and hardware capabilities of mobile devices, there is an REFERENCES increased need for device-specific content, what [1] C. Allauzen, M. Riley, J. Schalkwyk, W. resulted in market changes. Speech recognition Skut, and M. Mohri. OpenFst: A technology is of particular interest due to the direct general and efficient weighted _nite-state support of communications between human and transducer library. Lecture Notes in computers. Computer Science, 4783:11, 2007. Using the speech recognizer, which works [2] M. Bacchiani, F. Beaufays, J. Schalkwyk, over the Internet, allows much faster data M. Schuster, and B. Strope. Deploying processing. Another advantage is the much larger GOOG-411: Early lessons in data, databases that are used. Many believe that this mode measurement, and testing. In Proceedings is a big step for speech recognition technology. The of ICASSP, pages 5260{5263, 2008. accuracy of the system has significantly increased 257 | P a g e
  • 6. B. Raghavendhar Reddy, E. Mahender / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.253-258 [3] MJF Gales. Semi-tied full-covariance matrices for hidden Markov models. 1997. [4] B. Harb, C. Chelba, J. Dean, and G. Ghemawhat. Back-o_ Language Model Compression. 2009. [5] H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87(4):1738{1752, 1990. [6] Maryam Kamvar and Shumeet Baluja. A large scale study of wireless search behavior: Google mobile search. In CHI, pages 701{709, 2006. [7] S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Transactions on Acoustics, Speech and Signal Processing, volume 35, pages 400{01, March 1987. [8] D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah. Boosted MMI for model and feature-space discriminative training. In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2008. [9] R. Sproat, C. Shih, W. Gale, and N. Chang. A stochastic _nite-state word-segmentation algorithm for Chinese. Computational linguistics,22(3):377{404, 1996. [10] C. Van Heerden, J. Schalkwyk, and B. Strope. Language modeling for what-with- where on GOOG-411. 2009. WEB SITES Minimalist GNU for Windows, 06/01/2010, http://www.mingw.org/ Code Blocks, The open source, cross platform, IDE, 06/01/2010, http://www.codeblocks.org/ Java JDK SE 1.6 update 18, 02/04/2010, http://java.sun.com/javase/ ECLIPSE GALILEO Release 3.5.2, 02/04/2010, http://www.eclipse.org/galileo/ Java sound API documentation, 11/04/2010, http://java.sun.com/j2se/1.5.0/docs/guide/s ound/programmer_guide/contents.html Android 2.2 SDK release 6, http://developer.android.com/sdk/index.ht ml Android 2.2 SDK reference page, http://developer.android.com/reference/pac kages.html 258 | P a g e