SlideShare uma empresa Scribd logo
1 de 15
Speech Technology
     Overview
Presented by
Amr Medhat

               Computer Engineering Department
                               Cairo University
                                    22-10-2005
??Speech… Why


The easiest way of communication for
            human beings
??Speech… How
                                 Noise




             Channel


           Signal + … Protocol


Sender        Message                Receiver
Computer Analogy

             text (TTS)        speech


 Speech               Speech
Production            Synthesis

                       (ASR)      ( )

 Speech      speech    Speech     text
Perception            Recognition
Recognition Made Easy
             I bought a boat.
             ‫افرنقعوا أيها المتكأكئين‬
                gute Nacht
Feature                Decoder
Extraction            (Search)



    Grammar         Lexicon             Phone
                                        Models
Recognizer Characteristics
 Discrete words / continuous speech
 Read / spontaneous speech
 Speaker dependent / independent
 Small / large vocabulary
 Finite state / context sensitive language
  model
What to study
 Phonetics and Phonology (Linguistics)
 Speech Signal Processing (DSP)
 Pattern Recognition (AI)
     Hidden    Markov Models ( )
     Artificial Neural Networks
     Hybrid ANN - HMM
Phonetics
   Phonetics: study of the production, perception,
    and physical properties of speech sounds
   Phonology: describes the way sounds function
    within a given language and how they are
    combined and organized
   Phoneme: The smallest phonetic unit in a
    language that is capable of conveying a
    distinction in meaning
   E.g.
     boat-bought,   car-jar, ‫نشاط-شمس ,أرض-أحمد‬
Speech Signal Processing
   Sampling
     Rate:
         e.g. 16 kHz
   Sample size: e.g. 16 bits
 Format: PCM (.wav files)
 Time or Frequency domain features?
 Spectrogram: represents the time-varying
    spectrum of a signal. (x, y, intensity)
   Can’t represent features?:
     Filters   Banks, LPCs, MFCCs
Spectrogram




Waveform and Spectrogram of the word: "phonetician"
HMM
   What is a model?
   The coins example




   Parameter estimation: Baum-Welch
   Decoding: Viterbi P (O | λ)
Tools
   Audio Editing
     Cool Edit ( )
     Gold Wave
     Sound Forge
   ASR
       HTK ( )
       MATLAB
       Microsoft SAPI SDK
       Java Speech API
       ISIP ASR Toolkit
       Torch (Machine learning tool)
Technologies and applications
   Speech Recognition
     Dictation
     Call centers & IVR systems
     Command and control

   Speech Verification: Pronunciation teaching
   Speaker Recognition: Security
   Speech Synthesis
     Reading for the blind
     Telephone inquiries
?Can Image Processing Help
 Audio Visual Speech Recognition
 Spectrogram Reading
 Spectrogram Filtering
 vOICE: seeing with sound
Speech Technology Overview

Mais conteúdo relacionado

Mais procurados

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech RecognitionYogesh Vijay
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition systemRipal Ranpara
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translationAEGIS-ACCESSIBLE Projects
 
Respeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlersRespeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlersUniversity of Warsaw
 
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...SignWriting For Sign Languages
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in androidAnshuli Mittal
 
English ll
English llEnglish ll
English llJose
 

Mais procurados (20)

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technology
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translation
 
Respeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlersRespeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlers
 
Final thesis
Final thesisFinal thesis
Final thesis
 
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
Model of Communication:Basic
Model of Communication:BasicModel of Communication:Basic
Model of Communication:Basic
 
English ll
English llEnglish ll
English ll
 

Destaque

ELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and MemoryELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and MemoryYuriy Guts
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorDr. Cupid Lucid
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceBlackboardEMEA
 
OPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATIONOPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATIONRanjit Pudi
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Optical drive
Optical driveOptical drive
Optical drivedanimole
 
Unit – 2
Unit – 2Unit – 2
Unit – 2techbed
 
Project Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech SynthesisProject Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech SynthesisMamun Ahmed
 
Human-Computer Interaction
Human-Computer InteractionHuman-Computer Interaction
Human-Computer InteractionTarek Amr
 
Introduction to HCI
Introduction to HCI Introduction to HCI
Introduction to HCI Deskala
 
HCI - Chapter 1
HCI - Chapter 1HCI - Chapter 1
HCI - Chapter 1Alan Dix
 
Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)Lora Aroyo
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 

Destaque (20)

ELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and MemoryELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and Memory
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
 
Speech technology basics
Speech technology   basicsSpeech technology   basics
Speech technology basics
 
OPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATIONOPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATION
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Optical drive
Optical driveOptical drive
Optical drive
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Unit – 2
Unit – 2Unit – 2
Unit – 2
 
8251 USART
8251 USART8251 USART
8251 USART
 
Project Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech SynthesisProject Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech Synthesis
 
Cryptography
CryptographyCryptography
Cryptography
 
Encryption
EncryptionEncryption
Encryption
 
Human-Computer Interaction
Human-Computer InteractionHuman-Computer Interaction
Human-Computer Interaction
 
HCI Presentation
HCI PresentationHCI Presentation
HCI Presentation
 
Introduction to HCI
Introduction to HCI Introduction to HCI
Introduction to HCI
 
HCI - Chapter 1
HCI - Chapter 1HCI - Chapter 1
HCI - Chapter 1
 
Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 

Semelhante a Speech Technology Overview

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Enhance customer experience with conversational interfaces
Enhance customer experience with conversational interfacesEnhance customer experience with conversational interfaces
Enhance customer experience with conversational interfacesAmazon Web Services
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introductionacemindia
 

Semelhante a Speech Technology Overview (20)

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Asr
AsrAsr
Asr
 
Asr
AsrAsr
Asr
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Assign
AssignAssign
Assign
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
BTP paper
BTP paperBTP paper
BTP paper
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Khmer ASR
Khmer ASRKhmer ASR
Khmer ASR
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
Enhance customer experience with conversational interfaces
Enhance customer experience with conversational interfacesEnhance customer experience with conversational interfaces
Enhance customer experience with conversational interfaces
 
Iasa Presentatie
Iasa PresentatieIasa Presentatie
Iasa Presentatie
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 

Último

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Último (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Speech Technology Overview

  • 1. Speech Technology Overview Presented by Amr Medhat Computer Engineering Department Cairo University 22-10-2005
  • 2. ??Speech… Why The easiest way of communication for human beings
  • 3. ??Speech… How Noise Channel Signal + … Protocol Sender Message Receiver
  • 4. Computer Analogy text (TTS) speech Speech Speech Production Synthesis (ASR) ( ) Speech speech Speech text Perception Recognition
  • 5. Recognition Made Easy I bought a boat. ‫افرنقعوا أيها المتكأكئين‬ gute Nacht Feature Decoder Extraction (Search) Grammar Lexicon Phone Models
  • 6. Recognizer Characteristics  Discrete words / continuous speech  Read / spontaneous speech  Speaker dependent / independent  Small / large vocabulary  Finite state / context sensitive language model
  • 7. What to study  Phonetics and Phonology (Linguistics)  Speech Signal Processing (DSP)  Pattern Recognition (AI)  Hidden Markov Models ( )  Artificial Neural Networks  Hybrid ANN - HMM
  • 8. Phonetics  Phonetics: study of the production, perception, and physical properties of speech sounds  Phonology: describes the way sounds function within a given language and how they are combined and organized  Phoneme: The smallest phonetic unit in a language that is capable of conveying a distinction in meaning  E.g.  boat-bought, car-jar, ‫نشاط-شمس ,أرض-أحمد‬
  • 9. Speech Signal Processing  Sampling  Rate: e.g. 16 kHz  Sample size: e.g. 16 bits  Format: PCM (.wav files)  Time or Frequency domain features?  Spectrogram: represents the time-varying spectrum of a signal. (x, y, intensity)  Can’t represent features?:  Filters Banks, LPCs, MFCCs
  • 10. Spectrogram Waveform and Spectrogram of the word: "phonetician"
  • 11. HMM  What is a model?  The coins example  Parameter estimation: Baum-Welch  Decoding: Viterbi P (O | λ)
  • 12. Tools  Audio Editing  Cool Edit ( )  Gold Wave  Sound Forge  ASR  HTK ( )  MATLAB  Microsoft SAPI SDK  Java Speech API  ISIP ASR Toolkit  Torch (Machine learning tool)
  • 13. Technologies and applications  Speech Recognition  Dictation  Call centers & IVR systems  Command and control  Speech Verification: Pronunciation teaching  Speaker Recognition: Security  Speech Synthesis  Reading for the blind  Telephone inquiries
  • 14. ?Can Image Processing Help  Audio Visual Speech Recognition  Spectrogram Reading  Spectrogram Filtering  vOICE: seeing with sound

Notas do Editor

  1. What is the need for speech technology? Why do we need to develop computer technologies tackling human speech? It is the easiest way for communication between people. So, why not communicating with computers by means of it? It’ll be really great. Do you remember the definition of AI? Solving problems human can do better. A very little child can speak, hear and understand you, but he cannot read, write or even do simple calculations. That’s why we need speech technology.
  2. Like any communication system, the speech communication process comprises a message that needs to be carried from sender to receiver through a channel. المرسل يصيغ الرسالة اللى فى مخه إالى إشارات لجهاز النطق بوضع معين للأحبال الصوتية والحلق واللسان والشفايف والرئة .. فيتحول الهواء عبر كل تللك المؤثرات إلى تضاغطات و تخلخلات باهتزاز معين فتنقل عبر الهواء للطرف الآخر يقوم المستقبل بتجميع هذه الإشارات من الأذن الخارجية ثم تعبر الوسطى إلى الداخلية عبر المطرقة والسندان والركاب إلى الطبلة إلى القوقعة فتتحول لإلى إشارات عصبية يقوم المخ بالبحث عن مدلولها ومعناها حتى يصل لمعنى الرسالة ويفهمها طبعا يحمل الوسيط إشارات أخرى تنتشر عبر الهواء كصوت المروحة والسيارات والطلبة بالخارج و الزن ..إلخ all this is called noise, i.e. the channel doesn’t carry only the sender’s signal; it carries lots of signals combined together in a complex signal, the receiver do some processing to filter it out first. But, if all this happens, well you be able to understand the coming signal after all this processing and filtering??!! Imagine you I’m talking in Japanese and you understand only in German !! Or will the air conditioning understand the signal transmitted by a TV remote control ?!! So, the message is not just a signal, it’s a signal + a communication protocol agreed upon between sender and receiver. So, in speech the message == signal + language
  3. Our focus mainly is on ASR. Note: beside the microphone/speaker; the sound card in the computer with it’s A/D and D/A converter plays the role of ear and mouse (physical part of speech processing) Note: Microphone converts acoustic pressure ( التضاغطات والتخلخلات الصوتية ) to electrical analog signal, the speakers do the opposite operation.
  4. After the audience hear the three sentences from you (without displaying them); ask them what they understand from every utterance they heard. You won’t understand the third sentence assuming that you know English only (you don’t know Arabic or German), your ear will notice strange sound (ch خ ) that cannot perceive. In the second sentence (assuming you know Arabic) you ear can perceive every pronounced sound (you have what is called phone models in the sounds database in your brain) and by sense you can get the sentence structure ( فعل أمر ) (as you have the language grammar in your brain too) but you couldn’t understand the sentence because you don’t have synonyms for the words you heard in your dictionary (words lexicon in your brain) For the first sentence (assuming you know English) uttered sounds are ok as well as the words too; but two words have almost the same pronunciation. You hardly could get with the aid of the language grammar the told you the first word is a verb while the second is a noun. From this example, it becomes clear that speech perception is a searching process the brain performs in a fraction of a moment trying to find the appropriate match of the heard utterance given a large knowledge base constituted from (language sounds + words dictionary + language grammar) From here, the comes up the structure of a speech recognition engine.
  5. Read words: كلام مقروء (الكلام منتظر و متوقع قبل نطقه ) Spontaneous: كلام عفوي غير متوقع Speaker-dependent: the engine needs to build a special profile for every user and be trained on its voice and way of speaking before being able to run properly and give acceptable results Finite-state language model: جمل قليلة محدودة النطاق مثل نمر تليفونات على سبيل المثال Context-sensitive language model: غير محدود النكاق ومعتمد على سياق الكلام needs a complicated NLP system.
  6. Phonology answers the question: what is the sounds existing in this language? Phonetics answers the question: what is the properties of these sounds? phonetics, study of the sounds of languages from three basic points of view according to their production in the vocal organs their physical properties (acoustic phonetics), their effect on the ear
  7. When a child starts learning, when he sees a dog and asks you what is this; you tell him it’s a dog; after that when he sees a donkey or cat he point to it and says it is a dog; you tell him no this is a donkey and this is cat; after that he points to your cat and says it’s a cat; you tell him, no it’s not just a cat, it is my cat, its name is Poosy. This is the idea of a model . Firstly the child made a model in his mind for any animal (a 4 legs creature) as a dog. Then he narrowed his model to dogs, donkeys and cats; then he narrowed it again to Poosy cat. The same idea applies for a mathematical model. Depending on your system size and nature you choose how to take your models. If your system is just recognizes on of only three sentences; you might make just an HMM for each sentence. If the system searches in a dictionary on 10 words, make an HMM for each word. If it searches in combinations of words in different orders, narrow you model to the level of sub-words, tri-phones, mono-phones, or even allophones, according to the system size and the appropriate search tree size and depth the system can bear. You have to note that, number of states in your model is a function of the model size you choose (i.e a function of the feature vector or in other meaning a fucntion of the time length of the unit of utterance you build model for {ranging usually from a whole word to a sub-phone})