SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Research Issues in Speech Processing




                    Dr. M. Sabarimalai Manikandan
                        msm.sabari@gmail.com
Speech Production: the source-filter model
Speech signal conveys the information contained in the spoken word
         highly non-stationary signal
         Short segments of speech (20 to 30 ms )
         acoustical energy is in the frequency range of 100-6000 Hz




        Vocal tract transfer function can be modeled by an all-pole filter
Speech Processing Tasks


Speech recognition (recognizing lexical content)
Speech synthesis (Text-to speech)
Speaker recognition (recognizing who is speaking)
Speech understanding and vocal dialog
Speech coding (data rate deduction)
Speech enhancement (Noise reduction)
Speech transmission (noise free communication)
Voice conversion
Speech Processing
Speech measurements
       Short-time energy (STE)
       Zero crossing rate (ZCR)
       Autocorrelation (AC)
       Pitch period or frequency
       Formants

Speech signal components
       Speech-Silence or Non-speech
       Voiced speech-Unvoiced speech
Speech Processing
Speech representations or models
       Temporal features
          •   Low energy rate
          •   Zero crossing rate (ZCR)
          •   4Hz modulation energy
          •   Pitch contour

       Spectral features
           •    Spectral Centroid (sharpness)
           •    Spectral Flux (rate of change)
           •    Spectral Roll-Off (spectral shape)
           •    Spectral Flatness (deviation of the spectral form)
       Linear Predictive Coefficients (LPC)
       Cepstral coefficients
       Mel Frequency Cepstral Coefficients (MFCC): human auditory system
       Harmonic features: sinusoidal harmonic modelling
       Perceptual features: model of the human hearing process
       First order derivative (DELTA)
Elements of the speech signal
Phonemes: the smallest units of speech sounds
       Vowels and Consonants
       ~12 to 21 different vowel sounds used in the English language

       Consonants involve rapid and sometimes subtle changes in sound
              according to the manner of articulation:
                   •    plosive (p, b, t, etc.)
                   •    fricative (f, s, sh, etc.)
                   •    nasal (m, n, ng)
                   •    liquid (r, l) and
                   •    semivowel (w, y)

       Consonants are more independent of language than vowels are.

Syllable: one or more phonemes

Word: one or more syllables
Automatic Speech Recognition
There are two uses for speech recognition systems:

    Dictation: translation of the spoken word into written text
    Computer Control: control of the computer, and software
    applications by speaking commands

    Speaker dependent system: to operate for a single speaker
    Speaker independent system: to operate for any speaker
    of a particular type
    Speaker adaptive system: to adapt its operation to the
    characteristics of new speakers

    The size of vocabulary affects the complexity, processing
    requirements and the accuracy of the system
Speech Recognition: Applications

Automatic translation
Vehicle navigation systems
Human computer Interaction
Content-based spoken audio search
Home automation
Pronunciation evaluation
Robotics
Video games
Transcription of speech into mobile text messages
People with disabilities
Speech Recognition System

Sampling of speech

Acoustic signal processing:
   •     Linear Prediction Cepstral Coefficients (LPCC)
   •     Mel Frequency Cepstral Coefficients (MFCC)
   •     Perceptual Linear Prediction Cepstral Coefficients (PLPCC)

Recognition of phonemes, groups of phonemes and words:
   •    Dynamic Time Warping (DTW)
   •    hidden Markov models (HMMs)
   •    Gaussian mixture models (GMMs)
   •    Neural Networks (NNs)
   •    Expert systems and combinations of techniques
Automatic Speaker Recognition
Speaker recognition: the process of automatically recognizing who is
speaking by using the speaker-specific information included in speech
sounds

Speaker identity: physiological and behavioral characteristics of the speech
production model of an individual speaker
         the spectral envelope (vocal tract characteristics)
         the supra-segmental features (voice source characteristics) of
         speech

Applications:
    •    banking over a telephone network
    •    telephone shopping and database access services
    •    voice dialing and mail
    •     information and reservation services
    •    security control for confidential information
    •    forensics and surveillance applications
Speaker Recognition
Speaker identification: the process of determining which registered speaker
provides input speech sounds

                                  Similarity



                               Ref. template or
                              model (speaker #1)


                                   Similarity                     Identification
  Input       Feature                              Maximum
 speech      Extraction                                               result
                                                   selection
                                                                   (Speaker ID)
                               Ref. template or
                              model (speaker #2)



                                   Similarity



                               Ref. template or
                              model (speaker #N)
Speaker Recognition
Speaker verification: the process of accepting or rejecting the
identity claim of a speaker.
     Input        Feature                                   Verification
    speech       Extraction    Similarity     Decision         result
                                                          (Accept /Reject)


                              Ref. template   Threshold
                Input           or model
               speech         (speaker #M)




         Open Set and Closed Set Recognition

         Text-dependent and Text-independent Recognition
                 •   Vector quantization
                 •   Gaussian mixture models (GMM)
                 •   Dynamic time warping (DTW)
                 •   Hidden Markov model (HMM)
Text-to-Speech (TTS) System
    Synthesis of Speech for effective human machine communications
                     reading email messages
                     call center help desks and customer care
                     announcement machines



Raw or            Text             Phonetic          Prosodic        Speech            Synthetic
tagged text      Analysis          Analysis          Analysis       Synthesis          Speech

                    Document
                                      Homograph
                    Structure                           Pitch        Voice Rendering
                                    disambiguation
                    Detection


                                    Grapheme-to-
                       Text
                                      Phoneme          Duration
                   Normalization
                                     Conversion



                     Linguistic
                      Analysis




              Synthetic speech should be intelligible and natural
Speech Synthesis

Text-to-speech (TTS) synthesis systems
       Approach
       TTS system performance measure
          • Synthetic Speech Intelligibility
          • Synthetic speech naturalness

Speech Intelligibility Tests
      Segmental level analysis
          • the Rhyme Test
          • the Modified Rhyme Test
          • the Diagnostic Rhyme Test
      Supra-segmental analysis
          • the Harvard Psychoacoustic Sentences (HPS)
          • the Haskins syntactic sentences
Speech Coding (Compression)
Speech Coding for efficient transmission and storage of speech
           narrowband and broadband wired telephony
           cellular communications
           Voice over IP (VoIP) to utilize the Internet
           Telephone answering machines
           IVR systems
           Prerecorded messages
Speech-Assisted Translation Corrector System

 Objective: Develop a speech-assisted translation corrector (SATC)
 system which provides a grammatically correct sentence for a
 translated sentence from the machine translation
                              translated sentence                               grammatically
input                                 with                                      correct sentence
sentence       Multilingual   grammatical errors        Speech assisted
                Machine                               translation corrector
               Translation                                   system               text




He          came     here                                           speech               storage
                                                    Translator
                                                    speech signal is produced from the
                                                    words in the translated sentence.



“A MT system is correct and complete if it can analyze of the grammatical structures
encountered in the source language, and it can generate all of the grammatical structures
necessary in the target language translation.”
8/25/2011                                                                                    16
SATC System: Requirements and Challenging Tasks

   Creation of large scale rich multilingual speech databases is crucial
 task for research and development in language and speech technology

            Indian languages
            speakers (10 Males and 10 Females)
            age groups ( <20, 15-40, >40)
            audio format: 16-bit stereo, and sampling rate of 44.1 kHz
            annotation and assessment of speech databases


   Development of multilingual text to speech interface

   Development of spoken word matching module

   Development of speech signal processing (SSP) tools



8/25/2011                                                                17
Major Problems in Speech Processing
Acoustic variability: the same phonemes pronounced in
different contexts will have different acoustic realization
(coarticulation effect)

The signal is different when speech is uttered in various
environments:
       noise
       reverberation
       different types of microphones.

Speaking variability: when the same speaker speaks normally,
shouts, whispers, uses a creaky voice, or has a cold

Speaker variability: since different speakers have different
timbers and different speaking habits
Major Problems in Speech Processing
Linguistic variability: the same sentence can be pronounced
in many different ways, using many different words,
synonyms, and many different syntactic structures and
prosodic schemes

Phonetic variability: due to the different possible
pronunciations of the same words by speakers having
different regional accents

Lombard effect: noise modifies the utterance of the words (as
people tend to speak louder)
Major Problems in Speech Processing
Continuous speech:
   words are connected together (not separated by pauses or
   silences).

   It is difficult to find the start and end points of words

   The production of each phoneme is affected by the
   production of surrounding phonemes

   The start and end of words are affected by the preceding
   and following words

   the rate of speech (fast speech tends to be harder)
References

M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to
observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp.
777-782, 1999

S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981

M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements
in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286,
2001

T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor
tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996.

S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech
acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998.

Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using
Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Artificial intelligence for speech recognition
Artificial intelligence for speech recognitionArtificial intelligence for speech recognition
Artificial intelligence for speech recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice morphing ppt
Voice morphing pptVoice morphing ppt
Voice morphing ppt
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 

Destaque

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizersheilacook
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeerRanbeer Tyagi
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speechRaghu Veer
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentationrandan88
 
Radio Communication
Radio CommunicationRadio Communication
Radio CommunicationJohn Grace
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processingsandhya jois
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....pptbalu008
 

Destaque (10)

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizer
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
 

Semelhante a Speech processing

Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overviewamr0mt
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognitionVipul Munot
 

Semelhante a Speech processing (20)

Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Assign
AssignAssign
Assign
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 

Último

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 

Último (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 

Speech processing

  • 1. Research Issues in Speech Processing Dr. M. Sabarimalai Manikandan msm.sabari@gmail.com
  • 2. Speech Production: the source-filter model Speech signal conveys the information contained in the spoken word highly non-stationary signal Short segments of speech (20 to 30 ms ) acoustical energy is in the frequency range of 100-6000 Hz Vocal tract transfer function can be modeled by an all-pole filter
  • 3. Speech Processing Tasks Speech recognition (recognizing lexical content) Speech synthesis (Text-to speech) Speaker recognition (recognizing who is speaking) Speech understanding and vocal dialog Speech coding (data rate deduction) Speech enhancement (Noise reduction) Speech transmission (noise free communication) Voice conversion
  • 4. Speech Processing Speech measurements Short-time energy (STE) Zero crossing rate (ZCR) Autocorrelation (AC) Pitch period or frequency Formants Speech signal components Speech-Silence or Non-speech Voiced speech-Unvoiced speech
  • 5. Speech Processing Speech representations or models Temporal features • Low energy rate • Zero crossing rate (ZCR) • 4Hz modulation energy • Pitch contour Spectral features • Spectral Centroid (sharpness) • Spectral Flux (rate of change) • Spectral Roll-Off (spectral shape) • Spectral Flatness (deviation of the spectral form) Linear Predictive Coefficients (LPC) Cepstral coefficients Mel Frequency Cepstral Coefficients (MFCC): human auditory system Harmonic features: sinusoidal harmonic modelling Perceptual features: model of the human hearing process First order derivative (DELTA)
  • 6. Elements of the speech signal Phonemes: the smallest units of speech sounds Vowels and Consonants ~12 to 21 different vowel sounds used in the English language Consonants involve rapid and sometimes subtle changes in sound according to the manner of articulation: • plosive (p, b, t, etc.) • fricative (f, s, sh, etc.) • nasal (m, n, ng) • liquid (r, l) and • semivowel (w, y) Consonants are more independent of language than vowels are. Syllable: one or more phonemes Word: one or more syllables
  • 7. Automatic Speech Recognition There are two uses for speech recognition systems: Dictation: translation of the spoken word into written text Computer Control: control of the computer, and software applications by speaking commands Speaker dependent system: to operate for a single speaker Speaker independent system: to operate for any speaker of a particular type Speaker adaptive system: to adapt its operation to the characteristics of new speakers The size of vocabulary affects the complexity, processing requirements and the accuracy of the system
  • 8. Speech Recognition: Applications Automatic translation Vehicle navigation systems Human computer Interaction Content-based spoken audio search Home automation Pronunciation evaluation Robotics Video games Transcription of speech into mobile text messages People with disabilities
  • 9. Speech Recognition System Sampling of speech Acoustic signal processing: • Linear Prediction Cepstral Coefficients (LPCC) • Mel Frequency Cepstral Coefficients (MFCC) • Perceptual Linear Prediction Cepstral Coefficients (PLPCC) Recognition of phonemes, groups of phonemes and words: • Dynamic Time Warping (DTW) • hidden Markov models (HMMs) • Gaussian mixture models (GMMs) • Neural Networks (NNs) • Expert systems and combinations of techniques
  • 10. Automatic Speaker Recognition Speaker recognition: the process of automatically recognizing who is speaking by using the speaker-specific information included in speech sounds Speaker identity: physiological and behavioral characteristics of the speech production model of an individual speaker the spectral envelope (vocal tract characteristics) the supra-segmental features (voice source characteristics) of speech Applications: • banking over a telephone network • telephone shopping and database access services • voice dialing and mail • information and reservation services • security control for confidential information • forensics and surveillance applications
  • 11. Speaker Recognition Speaker identification: the process of determining which registered speaker provides input speech sounds Similarity Ref. template or model (speaker #1) Similarity Identification Input Feature Maximum speech Extraction result selection (Speaker ID) Ref. template or model (speaker #2) Similarity Ref. template or model (speaker #N)
  • 12. Speaker Recognition Speaker verification: the process of accepting or rejecting the identity claim of a speaker. Input Feature Verification speech Extraction Similarity Decision result (Accept /Reject) Ref. template Threshold Input or model speech (speaker #M) Open Set and Closed Set Recognition Text-dependent and Text-independent Recognition • Vector quantization • Gaussian mixture models (GMM) • Dynamic time warping (DTW) • Hidden Markov model (HMM)
  • 13. Text-to-Speech (TTS) System Synthesis of Speech for effective human machine communications reading email messages call center help desks and customer care announcement machines Raw or Text Phonetic Prosodic Speech Synthetic tagged text Analysis Analysis Analysis Synthesis Speech Document Homograph Structure Pitch Voice Rendering disambiguation Detection Grapheme-to- Text Phoneme Duration Normalization Conversion Linguistic Analysis Synthetic speech should be intelligible and natural
  • 14. Speech Synthesis Text-to-speech (TTS) synthesis systems Approach TTS system performance measure • Synthetic Speech Intelligibility • Synthetic speech naturalness Speech Intelligibility Tests Segmental level analysis • the Rhyme Test • the Modified Rhyme Test • the Diagnostic Rhyme Test Supra-segmental analysis • the Harvard Psychoacoustic Sentences (HPS) • the Haskins syntactic sentences
  • 15. Speech Coding (Compression) Speech Coding for efficient transmission and storage of speech narrowband and broadband wired telephony cellular communications Voice over IP (VoIP) to utilize the Internet Telephone answering machines IVR systems Prerecorded messages
  • 16. Speech-Assisted Translation Corrector System Objective: Develop a speech-assisted translation corrector (SATC) system which provides a grammatically correct sentence for a translated sentence from the machine translation translated sentence grammatically input with correct sentence sentence Multilingual grammatical errors Speech assisted Machine translation corrector Translation system text He came here speech storage Translator speech signal is produced from the words in the translated sentence. “A MT system is correct and complete if it can analyze of the grammatical structures encountered in the source language, and it can generate all of the grammatical structures necessary in the target language translation.” 8/25/2011 16
  • 17. SATC System: Requirements and Challenging Tasks Creation of large scale rich multilingual speech databases is crucial task for research and development in language and speech technology Indian languages speakers (10 Males and 10 Females) age groups ( <20, 15-40, >40) audio format: 16-bit stereo, and sampling rate of 44.1 kHz annotation and assessment of speech databases Development of multilingual text to speech interface Development of spoken word matching module Development of speech signal processing (SSP) tools 8/25/2011 17
  • 18. Major Problems in Speech Processing Acoustic variability: the same phonemes pronounced in different contexts will have different acoustic realization (coarticulation effect) The signal is different when speech is uttered in various environments: noise reverberation different types of microphones. Speaking variability: when the same speaker speaks normally, shouts, whispers, uses a creaky voice, or has a cold Speaker variability: since different speakers have different timbers and different speaking habits
  • 19. Major Problems in Speech Processing Linguistic variability: the same sentence can be pronounced in many different ways, using many different words, synonyms, and many different syntactic structures and prosodic schemes Phonetic variability: due to the different possible pronunciations of the same words by speakers having different regional accents Lombard effect: noise modifies the utterance of the words (as people tend to speak louder)
  • 20. Major Problems in Speech Processing Continuous speech: words are connected together (not separated by pauses or silences). It is difficult to find the start and end points of words The production of each phoneme is affected by the production of surrounding phonemes The start and end of words are affected by the preceding and following words the rate of speech (fast speech tends to be harder)
  • 21. References M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp. 777-782, 1999 S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981 M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286, 2001 T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996. S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998. Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.