SlideShare a Scribd company logo
1 of 47
Download to read offline
Speereo Software, 2009
                                www.speereo.com

   Speereo Speech Recognition Technologies



Konstantin Lamin    Oleg Maleev          Daniel Ischenko
CEO                 CTO, VP of R&D       VP of Business
lamin@speereo.com   maleev@speereo.com   Development
                                         d_ischenko@speereo.co
                                         m
What speech technologies are needed for?
  User friendliness
Speech is most natural way of communication for humans.
Therefore speech interface is most natural way to interact with
mobile device.

  Mobility
While using speech interface User‟s hand and eyes are free for
any other activity.

  Device novelty
Speech interface gives User an easy-to-use device not
burdened by numerous keys or large screens.
Automatic Speech Recognition System (ASR)


ASR is a conversion of speech signal to text or control
commands. ASR allows to manufacture devices with
speech control abilities, i.e. speech interface.



  Voice                                     Command ID
                           ASR
Speech Synthesizer (TTS)

Text to speech (TTS) is a signal conversion in with
consideration of language pronunciation norms. It allows to
create „speaking‟ devices.



   Text                                            Speech


                             TTS
Speech signal compression

Allows to record speech signal with small memory size.

  Speech                                       Packing Data
                          Packing




                         Unpacking
Packing Data                                     Speech
Speech technology on desktop PC
 ASR
    Pentium 4 2.0 GHz , 64 MB
    Memory bandwidth 1.2 GB/s

 TTS
       Pentium 4 2.0 GHz , 100-500 MB

      Standard solutions are not acceptable for embedded
and mobile devices. Threfore special approaches for reduction
of CPU and memory usage must be applied.
Requirements for embedded devices
             (low footprint)

 Compactness: used memory size less than 1-2 MB).

 Possibility to perform with CPU under 100 MIPS 300
MHz XScale - 12x or more output in performance with 2.0
GHz Pentium 4.

 Low memory bandwidth (XScale delivers only 64
MB/s).
Embedded Speech SDK

  Intuitively understandable and simple API accessible
for use by non-specialists in speech technologies field.

  Scalable and portable software design.

 Possibility to use with various OS, or on devices with no
OS.

  Only Software! No demand for use of any additional
hardware.
Speech Recognition Technology Characteristics
   Speaker-Dependent or Speaker-Independent?

  Is training necessary? Training necessity annoys
 Users.

  Recognition on the phonetic (any size dictionary) or
 whole-word level (small dictionaries only)?

   Large (>10000 words) or small size vocabulary?

   Is dynamic change of recognizable commands set
 possible?
Optimal System of Speech Control
  Speaker-Independent.

  Flexible large vocabulary, allowing to change the set of
recognizable words and phrases „on the fly‟.

  Noise robustness. Ability to use device in different
conditions (car, outdoors, in a crowded surrounding).

  Stability to pronunciation variations, including nonnative
speakers.
Speereo Speech Recognition Engine


           Acoustic                          Real-Time
         environment   Phone Models          Trancriber




Speech
          Acoustic                           Very Large
                        Decoder
          Front-End                          Vocabulary


                               Recognition
                               result
Acoustic Front-End

  Features system, 41 coefficient.

  Setting on acoustic environment.

  Special algorithm for automatic setting on the
microphone type (far-field or close talk), conditions of
recording and a channel distortion.

  Special algorithm for operation of the system in a car.
Acoustic model Decoder

  Continuous Density Hidden Markov‟s Models (more
precise).
  Discrete Hidden Markov‟s Models (faster).
  For English language 63 HMM models that include
2446 mixture Gaussian components.
  Parameters of HMM models have been determined
statistically with use of a priori phonetic restrictions.
  Enhanced algorithm of decoder functionality to speed
up work mechanism.
Real-Time Transcriber


  Converts written English words and phrases to suitable
for recognition form.

  Unlimited dictionary. Out-of-Vocabulary problem solved.

  Recognition of first and last names.

  Recognition of geographic names.
Accuracy
  Test 1: Long phrases recognition

Test conditions: statistical sampling – 1680 utterances, 626
unique phrases. Language – English.
Recognition accuracy – 99.9%.

  Test 2: Short words recognition

Test conditions: numerical vocabulary database (including
inarticulately pronounced words), 11 unique words.
Language – English: recognition accuracy – 99.2%.
Language – Russian: recognition accuracy – 98.5%.
Noise Robustness
 Test 3: accuracy dependence on noise level


            0       5      10       15        20     Clear
SNR (dB)



Accuracy
           98,2    98,4    98,3    98,6       98,7   99,2
  (%)



Speereo Speech Engine demonstrates high noise robustness.
Speereo Speech Engine in a Car

 Test 4: long phrases recognition in noisy surrounding

Test conditions: statistical sampling – 1632 utterances, 626
unique phrases. Noise sample – moving vehicle with windows
rolled down.
Language – English.
Recognition accuracy – 97,6%.

Due to special algorithms, Speereo Recognition Engine
demonstrates good robustness in a car.
Comparison of Recognition Systems
 Number of mistakes in tests 1 and 2 (less value is better)
     80
     70
     60
     50
                                                    Phrases
     40
                                                    Digits
     30
     20
     10
      0
          Philips   Microsoft   IBM   Speereo


While testing the following product have been used:
Philips FreeSpeech 2000, Microsoft Speech Recognition
Engine 4.0, IBM ViaVoice 7.0, Speereo Speech Engine 2.0
Speereo Speech Recognition Technology
              Features

High accuracy speech recognition
Speaker-Independence
Large vocabulary (>100000 words)
Short latency
Noise robustness
Excellent compatibility
Ease of use
CPU and Memory requirements

  Speereo Speech Engine currently supports a wide
variety of processors, such as SHx, TMPR39XX, NEC
VR4122, MIPS, ARM, Xscale, etc.

  Speereo Speech Engine operates with CPU with
performance from 40 MIPS (80 recommended) and
memory from 700 KB.
Speereo Speech Recognition SDK
  Simple API not requiring skills in the speech technology
development.

  Supports Windows Mobile, Symbian, Java, other platforms
and embedded devices with no OS.

  For OS Windows Mobile and Symbian the operation support
with ready made Audio Input-Output is provided. No need to
program Audio Input-Output.

 Support of smartphones based on Series 60, UIQ, Windows
Mobile and mobile devices with J2M.
Speereo Speech Engine Windows CE Version
       Audio
    Input-Output



                   List of speech commands



                                             Application 1
                     Speech commands
                     pronounced by user




      Speereo                                Application 2
   Speech Engine


                                             Application N
Use of Speereo Speech Engine (SE)
Operation of SE can be divided into 2 major stages:
1.Application defines the operating mode of SE and if it‟s
necessary sends the list of speech commands to SE.

2.When User pronounces a phrase (command), SE determines
most probable phrase from the list of received speech
commands and sends its ID to the application.

Developer does not need to trace the moment of pronouncing
of a phrase. All one needs is to process the Speereo Speech
Engine message that contains ID of the command pronounced
by User.
Recognition modes
There are 3 recognition modes of SE realized currently:

1. Recognition of phrases with words known to SE and
included into the vocabulary.

2. Recognition of phrases with unknown to SE words (mostly
personal names, etc.). In this case unknown words are
transcribed automatically.

3. Recognition of numbers from the 1 to the 31. There is a
special mode for improvement of ordinal numbers recognition
accuracy.
Speereo Speech Engine Initialization

In order to use the Speech Interface in any application
developer must register given application in Speereo Speech
Engine by accessing AddRegisterApplication function.

Function prototype is as follows:
UINT AddRegisterApplication (HWND hWnd), where hWnd is
the handle of the developer‟s application window which
receives the message from SE.
Speech Commands List Creation

Speech Commands List is created by AddPhrase function for
each speech command.

void AddPhrase (LPCTSTR pszText, DWORD dwId)
Where pszText is a speech command in orthographic form and
dwId is the identifier of the speech command that will be
returned by SE if the speech command is pronounced.
Response receipt from SE
Message WM_SRT_ACCEPTHYPO passes identifier of
recognized speech command as wPARAM parameter.
Message goes from SE to the application window hWnd of
which was used in the AddRegisterApplication function as its
parameter.

Example:
case WM_SRT_ACCEPTHYPO:
MakeHypo (wParam);
return TRUE;
MakeHypo is developer's command for implementation of
speech commands functionality here.
Defining Speech Commands Example

AddPhrase (_T(“Open Window”), ID_OPEN_WINDOW)
AddPhrase (_T(“Close Window”), ID_CLOSE_WINDOW)



That means that two speech commands (“Open Window” and
“Close Window”) are passed to SE with identifiers
ID_OPEN_WINDOW and ID_CLOSE_WINDOW accordingly.
It’s That Simple!

In order to build speech interface into the application using
Speereo Speech Engine one has to make following three
simple steps:

1.Initialize Speereo Speech Engine.
2.Define list of speech commands.
3.Define application‟s reaction to speech commands.
Speereo Speech Engine Additional Features

1.Microphone and speaker controls.
2.Ability to interact with several applications simultaneously.
3.Ability to record sound and voice signal via microphone and
real-time compression.
4.Ability to play sound and voice signals for User/speaker.
5.Speech signal detector selection (continuous monitoring of
speech signal or recognition launch on a key press).
Speereo Speech Engine
           Implementation Possibilities
   Home appliances
   Consumer electronics (audio/video systems)
   Computer hardware and software (all operations)
   Portable devices (mobile phones, smartphones)
   Voice mail system
   Other embedded devices
Using Speereo Speech Interface can greatly contribute to
functionality, accessibility, and innovative appeal of any
product by making it fully interactive, easy to control, and
therefore more productive and enjoyable.
Example 1: operating a phonebook

                                   Feature can be accessed by
Instead of selecting from menu…
                                   one short phrase: “Call
                 Menu
                                   Samantha”.

                            Send voice message via E-mail/MMS:
    Names
                            Say “Send E-mail” or “Send MMS”.
                            You will be prompted to give the name
                            of recipient. Since names are
       Search Samantha
                            articulated the system finds the name
                            in the database and offers to send a
       Call                 voice message.
Example 2: Mobile Voice Interface
Voice Interface for mobile services is highly requires by mobile
community.



     Weather                                  Maps

     Dictionaries                             Tickets booking

     Exchange rates                           Information

     E-Commerce                               Humor
Example 3: GPS Voice Control




Speech menu                       Search P.O.I.




Map navigation                   Route indication
Speereo Voice Translator

Speaking in a foreign language? Nothing's more simple!

                       Speereo Voice Translator is an
                    Innovative mobile phrase book, that
                    understands a spoken phrase in English
                    (pronounced even with a strong accent)
                    and immediately reads back the same
                    phrase in Arabic, Chinese (Traditional or
                    Simplified), Danish, English, Finnish,
                    French, German, Italian, Korean, Polish,
                    Russian, Spanish or Turkish.
Speereo Voice Organizer

        Manage your personal information,
     send e-mails and set your schedule
     using only voice commands with our
     Stylus Free Concept – Speereo Voice
     Organizer!
        Free your hands & don‟t stop to work
     your mobile device – application will find
     and dial numbers, write e-mails and
     remind you of your appointments
     following your voice commands!
Use our unique skills!

Speech interface is a new level of the user‟s convenience. We
got the necessary knowledge for the successful
implementation of speech technology.

quot;A voice-operated scheduler is a very good idea and Speereo has made it
an impressive and enjoyable reality. Perhaps the best thing about Voice
Organizer is that you can access all of its features with one hand. One
touch of your Pocket PC's record button and your voice does all the work:
switching between days, week and month views of your events; adding new
events; or adding vocal notes to your phone contacts.“

Voice Recognition Programs for the Pocket PC
By John Mierau, Pocket PC Magazine, November 2002, Vol. 5 No. 5
Speech Synthesizers Types (TTS)
Whole words TTS
                         Words DB


                                           Speech
Text               Phrases compiler

Phonemic TTS
                               Phones DB
       Text
               Prosody
                                                 Speech
 Transcriber                  Phrases compiler

                Phones
TTS Requirements


Whole-words TTS
Predefined vocabulary (up to 2-3 thousands words) at the
system development stage.
CPU from 40 MIPS, RAM from 0.5 Mb requires pronunciation
by a narrator of all the vocabulary‟s words.
Phonemic TTS
Large dictionaries possible (over 100 thousands words).
CPU from 80 MIPS, RAM from 2 Mb, does not require setting
for a dictionary.
TTS Language Support


Whole-words ТТS
Any language may be used. Narrator needed to create the
word‟s database. Вevelopment time (1-2 weeks) depending on
dictionary.
Phonemic TTS
Presently there is support of English, Spanish, German and
Italian.
New language development period – 3 months.
Speech Compression Algorithms


Speech signal
16bit/8kHz
1 minute takes 960 КB in memory.
АDPCM (Adaptive Differential Pulse Code Modulation) is
recording only the difference between samples and adjusting
the coding scale dynamically)
1 minute takes 240 КB in memory.
Compression of any sound signal is possible.
Speech Compression Special Algorythms


Use of speech signal features allows to achieve higher
compression power:
GSM compression
1 minute takes about 100 КB in memory. Optimal compression
of speech signal only.
Speereo advanced compression
1 minute takes about 10.25 KB in memory.
It is possible to record more than 1.5 hours of speech signal
into 1mb of space.
Speereo Advanced Compression



Speereo compression/decompression algorithm in a real-time
mode requires a processor with performance of 60 MIPS and
memory of 200 КB.
Only Speereo decompression algorithm in a real-time mode
requires a processor with performance of 40 MIPS and
memory of 200 КB.
Speereo Compression Algorithms Usage

Preinstalled voice commands for mobile and embedded
devices play (decompression only).
Creation of voice User commands on PC with following
transfer them to mobile and embedded devices
(decompression on embedded devices, compression on
desktop PC).
Recording and play of Users‟ commands on mobile and
embedded devices (compression and decompression on
embedded devices).
Conclusion


Speereo Speech Technology for embedded devices:
Automatic Speech Recognition (ASR) from 40 MIPS(80 MIPS
is recommended) from memory of 700 KB.
Speech synthesizer (TTS) from 40(80) MIPS, from memory of
500KB (2Mb).
Speech signal compression from 40 MIPS, from memory of
200 KB.
Speereo Speech Technology
           Technology that understands your language




     QUESTIONS? COMMENTS?
                                          Speereo Software UK
                                            www.speereo.com


Konstantin Lamin     Oleg Maleev            Daniel Ischenko
CEO                  CTO, VP of R&D         VP of Business
lamin@speereo.com    maleev@speereo.com     Development
                                            d_ischenko@speereo.co
                                            m

More Related Content

What's hot

Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)Hardik_Dimps
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxNsaroj kumar
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...Guy De Pauw
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
02 state of the art speech technology using java speech api@egsp 25.08.2011
02 state of the art speech technology using java speech api@egsp 25.08.201102 state of the art speech technology using java speech api@egsp 25.08.2011
02 state of the art speech technology using java speech api@egsp 25.08.2011VinothkumaR Ramu
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaEdureka!
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
IRJET- Optical Character Recognition for Blind using Raspberry Pi
IRJET- Optical Character Recognition for Blind using Raspberry PiIRJET- Optical Character Recognition for Blind using Raspberry Pi
IRJET- Optical Character Recognition for Blind using Raspberry PiIRJET Journal
 
Speech recognition-using-wavelet-transform
Speech recognition-using-wavelet-transformSpeech recognition-using-wavelet-transform
Speech recognition-using-wavelet-transformvidhateswapnil
 

What's hot (20)

Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
Speech Recognition By Hardik Mistry(Laxmi Institute Of Technology)
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speech
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Aplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan ŠedivýAplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan Šedivý
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
02 state of the art speech technology using java speech api@egsp 25.08.2011
02 state of the art speech technology using java speech api@egsp 25.08.201102 state of the art speech technology using java speech api@egsp 25.08.2011
02 state of the art speech technology using java speech api@egsp 25.08.2011
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
IRJET- Optical Character Recognition for Blind using Raspberry Pi
IRJET- Optical Character Recognition for Blind using Raspberry PiIRJET- Optical Character Recognition for Blind using Raspberry Pi
IRJET- Optical Character Recognition for Blind using Raspberry Pi
 
Speech recognition-using-wavelet-transform
Speech recognition-using-wavelet-transformSpeech recognition-using-wavelet-transform
Speech recognition-using-wavelet-transform
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 

Similar to General Speereo Technology

Similar to General Speereo Technology (20)

Technology Offer Intro
Technology Offer IntroTechnology Offer Intro
Technology Offer Intro
 
Svl Oem Offer
Svl Oem OfferSvl Oem Offer
Svl Oem Offer
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Speereo Browser
Speereo BrowserSpeereo Browser
Speereo Browser
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Assign
AssignAssign
Assign
 
Speereo Voice Translator for Windows Mobile
Speereo Voice Translator for Windows MobileSpeereo Voice Translator for Windows Mobile
Speereo Voice Translator for Windows Mobile
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
VOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEMVOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEM
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
Cloud-Native Roadshow - Google - DC
Cloud-Native Roadshow - Google - DCCloud-Native Roadshow - Google - DC
Cloud-Native Roadshow - Google - DC
 
BTP paper
BTP paperBTP paper
BTP paper
 
Seminar
SeminarSeminar
Seminar
 
Intelligent speech based sms system
Intelligent speech based sms systemIntelligent speech based sms system
Intelligent speech based sms system
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Assembly Language
Assembly LanguageAssembly Language
Assembly Language
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

General Speereo Technology

  • 1.
  • 2. Speereo Software, 2009 www.speereo.com Speereo Speech Recognition Technologies Konstantin Lamin Oleg Maleev Daniel Ischenko CEO CTO, VP of R&D VP of Business lamin@speereo.com maleev@speereo.com Development d_ischenko@speereo.co m
  • 3. What speech technologies are needed for? User friendliness Speech is most natural way of communication for humans. Therefore speech interface is most natural way to interact with mobile device. Mobility While using speech interface User‟s hand and eyes are free for any other activity. Device novelty Speech interface gives User an easy-to-use device not burdened by numerous keys or large screens.
  • 4. Automatic Speech Recognition System (ASR) ASR is a conversion of speech signal to text or control commands. ASR allows to manufacture devices with speech control abilities, i.e. speech interface. Voice Command ID ASR
  • 5. Speech Synthesizer (TTS) Text to speech (TTS) is a signal conversion in with consideration of language pronunciation norms. It allows to create „speaking‟ devices. Text Speech TTS
  • 6. Speech signal compression Allows to record speech signal with small memory size. Speech Packing Data Packing Unpacking Packing Data Speech
  • 7. Speech technology on desktop PC ASR Pentium 4 2.0 GHz , 64 MB Memory bandwidth 1.2 GB/s TTS Pentium 4 2.0 GHz , 100-500 MB Standard solutions are not acceptable for embedded and mobile devices. Threfore special approaches for reduction of CPU and memory usage must be applied.
  • 8. Requirements for embedded devices (low footprint) Compactness: used memory size less than 1-2 MB). Possibility to perform with CPU under 100 MIPS 300 MHz XScale - 12x or more output in performance with 2.0 GHz Pentium 4. Low memory bandwidth (XScale delivers only 64 MB/s).
  • 9. Embedded Speech SDK Intuitively understandable and simple API accessible for use by non-specialists in speech technologies field. Scalable and portable software design. Possibility to use with various OS, or on devices with no OS. Only Software! No demand for use of any additional hardware.
  • 10. Speech Recognition Technology Characteristics Speaker-Dependent or Speaker-Independent? Is training necessary? Training necessity annoys Users. Recognition on the phonetic (any size dictionary) or whole-word level (small dictionaries only)? Large (>10000 words) or small size vocabulary? Is dynamic change of recognizable commands set possible?
  • 11. Optimal System of Speech Control Speaker-Independent. Flexible large vocabulary, allowing to change the set of recognizable words and phrases „on the fly‟. Noise robustness. Ability to use device in different conditions (car, outdoors, in a crowded surrounding). Stability to pronunciation variations, including nonnative speakers.
  • 12. Speereo Speech Recognition Engine Acoustic Real-Time environment Phone Models Trancriber Speech Acoustic Very Large Decoder Front-End Vocabulary Recognition result
  • 13. Acoustic Front-End Features system, 41 coefficient. Setting on acoustic environment. Special algorithm for automatic setting on the microphone type (far-field or close talk), conditions of recording and a channel distortion. Special algorithm for operation of the system in a car.
  • 14. Acoustic model Decoder Continuous Density Hidden Markov‟s Models (more precise). Discrete Hidden Markov‟s Models (faster). For English language 63 HMM models that include 2446 mixture Gaussian components. Parameters of HMM models have been determined statistically with use of a priori phonetic restrictions. Enhanced algorithm of decoder functionality to speed up work mechanism.
  • 15. Real-Time Transcriber Converts written English words and phrases to suitable for recognition form. Unlimited dictionary. Out-of-Vocabulary problem solved. Recognition of first and last names. Recognition of geographic names.
  • 16. Accuracy Test 1: Long phrases recognition Test conditions: statistical sampling – 1680 utterances, 626 unique phrases. Language – English. Recognition accuracy – 99.9%. Test 2: Short words recognition Test conditions: numerical vocabulary database (including inarticulately pronounced words), 11 unique words. Language – English: recognition accuracy – 99.2%. Language – Russian: recognition accuracy – 98.5%.
  • 17. Noise Robustness Test 3: accuracy dependence on noise level 0 5 10 15 20 Clear SNR (dB) Accuracy 98,2 98,4 98,3 98,6 98,7 99,2 (%) Speereo Speech Engine demonstrates high noise robustness.
  • 18. Speereo Speech Engine in a Car Test 4: long phrases recognition in noisy surrounding Test conditions: statistical sampling – 1632 utterances, 626 unique phrases. Noise sample – moving vehicle with windows rolled down. Language – English. Recognition accuracy – 97,6%. Due to special algorithms, Speereo Recognition Engine demonstrates good robustness in a car.
  • 19. Comparison of Recognition Systems Number of mistakes in tests 1 and 2 (less value is better) 80 70 60 50 Phrases 40 Digits 30 20 10 0 Philips Microsoft IBM Speereo While testing the following product have been used: Philips FreeSpeech 2000, Microsoft Speech Recognition Engine 4.0, IBM ViaVoice 7.0, Speereo Speech Engine 2.0
  • 20. Speereo Speech Recognition Technology Features High accuracy speech recognition Speaker-Independence Large vocabulary (>100000 words) Short latency Noise robustness Excellent compatibility Ease of use
  • 21. CPU and Memory requirements Speereo Speech Engine currently supports a wide variety of processors, such as SHx, TMPR39XX, NEC VR4122, MIPS, ARM, Xscale, etc. Speereo Speech Engine operates with CPU with performance from 40 MIPS (80 recommended) and memory from 700 KB.
  • 22. Speereo Speech Recognition SDK Simple API not requiring skills in the speech technology development. Supports Windows Mobile, Symbian, Java, other platforms and embedded devices with no OS. For OS Windows Mobile and Symbian the operation support with ready made Audio Input-Output is provided. No need to program Audio Input-Output. Support of smartphones based on Series 60, UIQ, Windows Mobile and mobile devices with J2M.
  • 23. Speereo Speech Engine Windows CE Version Audio Input-Output List of speech commands Application 1 Speech commands pronounced by user Speereo Application 2 Speech Engine Application N
  • 24. Use of Speereo Speech Engine (SE) Operation of SE can be divided into 2 major stages: 1.Application defines the operating mode of SE and if it‟s necessary sends the list of speech commands to SE. 2.When User pronounces a phrase (command), SE determines most probable phrase from the list of received speech commands and sends its ID to the application. Developer does not need to trace the moment of pronouncing of a phrase. All one needs is to process the Speereo Speech Engine message that contains ID of the command pronounced by User.
  • 25. Recognition modes There are 3 recognition modes of SE realized currently: 1. Recognition of phrases with words known to SE and included into the vocabulary. 2. Recognition of phrases with unknown to SE words (mostly personal names, etc.). In this case unknown words are transcribed automatically. 3. Recognition of numbers from the 1 to the 31. There is a special mode for improvement of ordinal numbers recognition accuracy.
  • 26. Speereo Speech Engine Initialization In order to use the Speech Interface in any application developer must register given application in Speereo Speech Engine by accessing AddRegisterApplication function. Function prototype is as follows: UINT AddRegisterApplication (HWND hWnd), where hWnd is the handle of the developer‟s application window which receives the message from SE.
  • 27. Speech Commands List Creation Speech Commands List is created by AddPhrase function for each speech command. void AddPhrase (LPCTSTR pszText, DWORD dwId) Where pszText is a speech command in orthographic form and dwId is the identifier of the speech command that will be returned by SE if the speech command is pronounced.
  • 28. Response receipt from SE Message WM_SRT_ACCEPTHYPO passes identifier of recognized speech command as wPARAM parameter. Message goes from SE to the application window hWnd of which was used in the AddRegisterApplication function as its parameter. Example: case WM_SRT_ACCEPTHYPO: MakeHypo (wParam); return TRUE; MakeHypo is developer's command for implementation of speech commands functionality here.
  • 29. Defining Speech Commands Example AddPhrase (_T(“Open Window”), ID_OPEN_WINDOW) AddPhrase (_T(“Close Window”), ID_CLOSE_WINDOW) That means that two speech commands (“Open Window” and “Close Window”) are passed to SE with identifiers ID_OPEN_WINDOW and ID_CLOSE_WINDOW accordingly.
  • 30. It’s That Simple! In order to build speech interface into the application using Speereo Speech Engine one has to make following three simple steps: 1.Initialize Speereo Speech Engine. 2.Define list of speech commands. 3.Define application‟s reaction to speech commands.
  • 31. Speereo Speech Engine Additional Features 1.Microphone and speaker controls. 2.Ability to interact with several applications simultaneously. 3.Ability to record sound and voice signal via microphone and real-time compression. 4.Ability to play sound and voice signals for User/speaker. 5.Speech signal detector selection (continuous monitoring of speech signal or recognition launch on a key press).
  • 32. Speereo Speech Engine Implementation Possibilities Home appliances Consumer electronics (audio/video systems) Computer hardware and software (all operations) Portable devices (mobile phones, smartphones) Voice mail system Other embedded devices Using Speereo Speech Interface can greatly contribute to functionality, accessibility, and innovative appeal of any product by making it fully interactive, easy to control, and therefore more productive and enjoyable.
  • 33. Example 1: operating a phonebook Feature can be accessed by Instead of selecting from menu… one short phrase: “Call Menu Samantha”. Send voice message via E-mail/MMS: Names Say “Send E-mail” or “Send MMS”. You will be prompted to give the name of recipient. Since names are Search Samantha articulated the system finds the name in the database and offers to send a Call voice message.
  • 34. Example 2: Mobile Voice Interface Voice Interface for mobile services is highly requires by mobile community. Weather Maps Dictionaries Tickets booking Exchange rates Information E-Commerce Humor
  • 35. Example 3: GPS Voice Control Speech menu Search P.O.I. Map navigation Route indication
  • 36. Speereo Voice Translator Speaking in a foreign language? Nothing's more simple! Speereo Voice Translator is an Innovative mobile phrase book, that understands a spoken phrase in English (pronounced even with a strong accent) and immediately reads back the same phrase in Arabic, Chinese (Traditional or Simplified), Danish, English, Finnish, French, German, Italian, Korean, Polish, Russian, Spanish or Turkish.
  • 37. Speereo Voice Organizer Manage your personal information, send e-mails and set your schedule using only voice commands with our Stylus Free Concept – Speereo Voice Organizer! Free your hands & don‟t stop to work your mobile device – application will find and dial numbers, write e-mails and remind you of your appointments following your voice commands!
  • 38. Use our unique skills! Speech interface is a new level of the user‟s convenience. We got the necessary knowledge for the successful implementation of speech technology. quot;A voice-operated scheduler is a very good idea and Speereo has made it an impressive and enjoyable reality. Perhaps the best thing about Voice Organizer is that you can access all of its features with one hand. One touch of your Pocket PC's record button and your voice does all the work: switching between days, week and month views of your events; adding new events; or adding vocal notes to your phone contacts.“ Voice Recognition Programs for the Pocket PC By John Mierau, Pocket PC Magazine, November 2002, Vol. 5 No. 5
  • 39. Speech Synthesizers Types (TTS) Whole words TTS Words DB Speech Text Phrases compiler Phonemic TTS Phones DB Text Prosody Speech Transcriber Phrases compiler Phones
  • 40. TTS Requirements Whole-words TTS Predefined vocabulary (up to 2-3 thousands words) at the system development stage. CPU from 40 MIPS, RAM from 0.5 Mb requires pronunciation by a narrator of all the vocabulary‟s words. Phonemic TTS Large dictionaries possible (over 100 thousands words). CPU from 80 MIPS, RAM from 2 Mb, does not require setting for a dictionary.
  • 41. TTS Language Support Whole-words ТТS Any language may be used. Narrator needed to create the word‟s database. Вevelopment time (1-2 weeks) depending on dictionary. Phonemic TTS Presently there is support of English, Spanish, German and Italian. New language development period – 3 months.
  • 42. Speech Compression Algorithms Speech signal 16bit/8kHz 1 minute takes 960 КB in memory. АDPCM (Adaptive Differential Pulse Code Modulation) is recording only the difference between samples and adjusting the coding scale dynamically) 1 minute takes 240 КB in memory. Compression of any sound signal is possible.
  • 43. Speech Compression Special Algorythms Use of speech signal features allows to achieve higher compression power: GSM compression 1 minute takes about 100 КB in memory. Optimal compression of speech signal only. Speereo advanced compression 1 minute takes about 10.25 KB in memory. It is possible to record more than 1.5 hours of speech signal into 1mb of space.
  • 44. Speereo Advanced Compression Speereo compression/decompression algorithm in a real-time mode requires a processor with performance of 60 MIPS and memory of 200 КB. Only Speereo decompression algorithm in a real-time mode requires a processor with performance of 40 MIPS and memory of 200 КB.
  • 45. Speereo Compression Algorithms Usage Preinstalled voice commands for mobile and embedded devices play (decompression only). Creation of voice User commands on PC with following transfer them to mobile and embedded devices (decompression on embedded devices, compression on desktop PC). Recording and play of Users‟ commands on mobile and embedded devices (compression and decompression on embedded devices).
  • 46. Conclusion Speereo Speech Technology for embedded devices: Automatic Speech Recognition (ASR) from 40 MIPS(80 MIPS is recommended) from memory of 700 KB. Speech synthesizer (TTS) from 40(80) MIPS, from memory of 500KB (2Mb). Speech signal compression from 40 MIPS, from memory of 200 KB.
  • 47. Speereo Speech Technology Technology that understands your language QUESTIONS? COMMENTS? Speereo Software UK www.speereo.com Konstantin Lamin Oleg Maleev Daniel Ischenko CEO CTO, VP of R&D VP of Business lamin@speereo.com maleev@speereo.com Development d_ischenko@speereo.co m