SlideShare uma empresa Scribd logo
1 de 232
Continuous Speech
 Keyword Spotting
        In

   by Jesse Sampermans (502400)
Overview
Overview


1. Hypothesis
Overview


1. Hypothesis
2. Historic Overview
Overview


1. Hypothesis
2. Historic Overview
3. Human Speech Organ
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
1.Overview
                      Hypothesis



2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
1. Hypothesis
1. Hypothesis



“Is it possible, with today’s known technology,
 to automatically trigger a recording device
       with a random word in a sentence
            over a telephone line?”
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview
                2. Historic Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
2. Historic Overview
2. Historic Overview

• Early Days (1700 - 1900)
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer

 - late 1700’s: Russian professor
      Christian Kratzenstein
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer

  - late 1700’s: Russian professor
       Christian Kratzenstein
 - Resonant tube attached to pipe
                organ
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer      Enhancement

  - late 1700’s: Russian professor
       Christian Kratzenstein
 - Resonant tube attached to pipe
                organ
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer      Enhancement

  - late 1700’s: Russian professor    - mid 1800’s: Charles
       Christian Kratzenstein             Wheatstone
 - Resonant tube attached to pipe
                organ
2. Historic Overview

• Early Days (1700 - 1900)




 First artificial speech synthesizer          Enhancement

  - late 1700’s: Russian professor       - mid 1800’s: Charles
       Christian Kratzenstein                 Wheatstone
 - Resonant tube attached to pipe     - Replace tubes with leather
                organ                         resonators
2. Historic Overview




Wheatstone Resonator
2. Historic Overview

• Early Days (1700 - 1900)
2. Historic Overview

• Early Days (1700 - 1900)




                     1881: Gramophone
2. Historic Overview

• Early Days (1700 - 1900)




                      1881: Gramophone
                    Alexander Graham Bell
2. Historic Overview

• Early Days (1700 - 1900)




                      1881: Gramophone
                    Alexander Graham Bell

                     - Dictation purposes
2. Historic Overview

• Early Days (1700 - 1900)
2. Historic Overview

• Early Days (1700 - 1900)




                   1939 World Fair: VODER
2. Historic Overview

• Early Days (1700 - 1900)




                   1939 World Fair: VODER
                        Homer Dudley
2. Historic Overview

• Early Days (1700 - 1900)




                   1939 World Fair: VODER
                        Homer Dudley

               - Based on Wheatstone Resonator
2. Historic Overview

• Early Days (1700 - 1900)




                   1939 World Fair: VODER
                        Homer Dudley

               - Based on Wheatstone Resonator
                 - Electrical & Mechanical Parts
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
2. Historic Overview

• First Speech Recognizers (1950 - 1980)



                          Vs.
2. Historic Overview

• First Speech Recognizers (1950 - 1980)



                                Vs.

   - Digit Recognition System
   based on speech formants
2. Historic Overview

• First Speech Recognizers (1950 - 1980)



                                Vs.

   - Digit Recognition System         - 10 syllable recognizer
   based on speech formants
2. Historic Overview

• First Speech Recognizers (1950 - 1980)



                                Vs.

   - Digit Recognition System          - 10 syllable recognizer
   based on speech formants           - Dynamic Time Warping
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                Commercialization 1960’s
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                Commercialization 1960’s


                          Vs.
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                Commercialization 1960’s


                          Vs.

      Office Automation
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                  Commercialization 1960’s


                            Vs.

      Office Automation
      - Voice typewriter
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                  Commercialization 1960’s


                            Vs.

      Office Automation
       - Voice typewriter
      - Trained databases
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                  Commercialization 1960’s


                            Vs.

      Office Automation             Telecom Automation
       - Voice typewriter
      - Trained databases
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                  Commercialization 1960’s


                            Vs.

      Office Automation             Telecom Automation
       - Voice typewriter           - Keyword Spotting
      - Trained databases
2. Historic Overview

• First Speech Recognizers (1950 - 1980)
                  Commercialization 1960’s


                            Vs.

      Office Automation             Telecom Automation
       - Voice typewriter           - Keyword Spotting
      - Trained databases             - Large Audience
2. Historic Overview
2. Historic Overview

• Modern evolutions (1980 - ...)
2. Historic Overview

• Modern evolutions (1980 - ...)
                    - Hidden Markov Models
2. Historic Overview

• Modern evolutions (1980 - ...)
                    - Hidden Markov Models
                    - CMU “Sphynx” = commercial success
2. Historic Overview

• Modern evolutions (1980 - ...)
                    - Hidden Markov Models
                    - CMU “Sphynx” = commercial success

                    - DARPA (Defense Advances Research Projects
                    Agency) investments
2. Historic Overview

• Modern evolutions (1980 - ...)
                    - Hidden Markov Models
                    - CMU “Sphynx” = commercial success

                    - DARPA (Defense Advances Research Projects
                    Agency) investments
                    - Battle Management
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
3. Human Speech Organ
                   Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
3. Human Speech Organ
3. Human Speech Organ
3. Human Speech Organ


           - Lungs: pump air
3. Human Speech Organ


           - Lungs: pump air
           - Larynx (Vocal Folds)
3. Human Speech Organ


           - Lungs: pump air
           - Larynx (Vocal Folds)
           - Articulators (Tongue, Lips, ...)
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
4. Phonetics & Speech Perception
                     Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
4. Phonetics & Speech Perception
4. Phonetics & Speech Perception

• Phonetics
4. Phonetics & Speech Perception

• Phonetics
4. Phonetics & Speech Perception

• Phonetics
                           - Smallest part of human speech
4. Phonetics & Speech Perception

• Phonetics
                           - Smallest part of human speech
                           - Originated in India around 2500 BC
4. Phonetics & Speech Perception

• Phonetics
                           - Smallest part of human speech
                           - Originated in India around 2500 BC
                           - IPA (International Phonetic
                           Alphabet)
4. Phonetics & Speech Perception

• Phonetics
                           - Smallest part of human speech
                           - Originated in India around 2500 BC
                           - IPA (International Phonetic
                           Alphabet)
                           - 44 phonemes in American English
4. Phonetics & Speech Perception
4. Phonetics & Speech Perception

• Speech Perception
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
          [k] in “kit” and “caught”   and   [i] in “kit” and “kick”
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
          [k] in “kit” and “caught”   and   [i] in “kit” and “kick”
  - Categorical Perception
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
          [k] in “kit” and “caught”   and   [i] in “kit” and “kick”
  - Categorical Perception
      Identifying words from different speakers
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
          [k] in “kit” and “caught”   and   [i] in “kit” and “kick”
  - Categorical Perception
      Identifying words from different speakers
      Categorize phonemes in brain
4. Phonetics & Speech Perception

• Speech Perception
  - Acoustic Cues
      Voice Onset Time: Unaspirated plosives (near 0 ms)
                        Aspirated plosives (> 30 ms)
                        Voiced plosives (< 0 ms)

  - Speech Segmentation
       Identifying boundaries between words (lexical) or phonemes
       (phonetic)
          [k] in “kit” and “caught”   and   [i] in “kit” and “kick”
  - Categorical Perception
      Identifying words from different speakers
      Categorize phonemes in brain
      Only native speakers
4. Phonetics & Speech Perception

• Speech Perception
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
          Fast → Shorter vowels, less pronounced stops, bad articulation
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
          Fast → Shorter vowels, less pronounced stops, bad articulation

      Speaker identity
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
          Fast → Shorter vowels, less pronounced stops, bad articulation

      Speaker identity
          - Gender and age differences
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
          Fast → Shorter vowels, less pronounced stops, bad articulation

      Speaker identity
          - Gender and age differences
          - Vocal chord size and hormone levels
4. Phonetics & Speech Perception

• Speech Perception
  - Variations in speech
       Phonetic environment can alter the sound of a phoneme

                [o] in “Bob”   and   [u] in “vulture”

      Speed of speech
          Fast → Shorter vowels, less pronounced stops, bad articulation

      Speaker identity
          - Gender and age differences
          - Vocal chord size and hormone levels
          - Place of birth
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
5. Telephone Speech Coding & Compression
                      Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
5. Telephone Speech Coding & Compression
5. Telephone Speech Coding & Compression

• Early days:   Analog
5. Telephone Speech Coding & Compression

• Early days:   Analog
  - Speech converted to control voltage in the phone
5. Telephone Speech Coding & Compression

• Early days:   Analog
  - Speech converted to control voltage in the phone
  - Passed through copper lines → crosstalk
5. Telephone Speech Coding & Compression

• Early days:   Analog
  - Speech converted to control voltage in the phone
  - Passed through copper lines → crosstalk


• 1980’s - present day:     Digital
5. Telephone Speech Coding & Compression

• Early days:    Analog
  - Speech converted to control voltage in the phone
  - Passed through copper lines → crosstalk


• 1980’s - present day:       Digital
  - Main advantages: Longer distance / greater speed / less carrier noise
5. Telephone Speech Coding & Compression

• Early days:    Analog
  - Speech converted to control voltage in the phone
  - Passed through copper lines → crosstalk


• 1980’s - present day:       Digital
  - Main advantages: Longer distance / greater speed / less carrier noise
  - Use of Optic Fiber lines → no crosstalk
5. Telephone Speech Coding & Compression
5. Telephone Speech Coding & Compression

• Now:   Mobile Phones
5. Telephone Speech Coding & Compression

• Now:   Mobile Phones
 - GSM: Speech
5. Telephone Speech Coding & Compression

• Now:   Mobile Phones
 - GSM: Speech
 - UMTS: Data
5. Telephone Speech Coding & Compression

• Now:   Mobile Phones
 - GSM: Speech
 - UMTS: Data

 - Frequency content of 3100 kHz
5. Telephone Speech Coding & Compression

• Now:    Mobile Phones
 - GSM: Speech
 - UMTS: Data

 - Frequency content of 3100 kHz
 - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
5. Telephone Speech Coding & Compression

• Now:     Mobile Phones
  - GSM: Speech
  - UMTS: Data

  - Frequency content of 3100 kHz
  - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR


• Technique:      Linear Predictive Coding (LPC)
5. Telephone Speech Coding & Compression

• Now:     Mobile Phones
  - GSM: Speech
  - UMTS: Data

  - Frequency content of 3100 kHz
  - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR


• Technique:      Linear Predictive Coding (LPC)
  - Formants (human resonance) are removed from speech
5. Telephone Speech Coding & Compression

• Now:     Mobile Phones
  - GSM: Speech
  - UMTS: Data

  - Frequency content of 3100 kHz
  - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR


• Technique:      Linear Predictive Coding (LPC)
  - Formants (human resonance) are removed from speech
  - What is left = sine wave → digitized with Fourier transform
5. Telephone Speech Coding & Compression

• Now:     Mobile Phones
  - GSM: Speech
  - UMTS: Data

  - Frequency content of 3100 kHz
  - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR


• Technique:      Linear Predictive Coding (LPC)
  - Formants (human resonance) are removed from speech
  - What is left = sine wave → digitized with Fourier transform
  - Formants are synthesized again in the receivers cellphone
5. Telephone Speech Coding & Compression

• Now:     Mobile Phones
  - GSM: Speech
  - UMTS: Data

  - Frequency content of 3100 kHz
  - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR


• Technique:      Linear Predictive Coding (LPC)
  - Formants (human resonance) are removed from speech
  - What is left = sine wave → digitized with Fourier transform
  - Formants are synthesized again in the receivers cellphone

  - Of great interest for speech recognition
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
6. Speech Enhancement
                     Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
6. Speech Enhancement
6. Speech Enhancement

• Pre-Filtering
6. Speech Enhancement

• Pre-Filtering
  - Frequency based
6. Speech Enhancement

• Pre-Filtering
  - Frequency based

  - Filter banks
6. Speech Enhancement

• Pre-Filtering
  - Frequency based

  - Filter banks
          - Commonly know as an equalizer
6. Speech Enhancement

• Pre-Filtering
  - Frequency based

  - Filter banks
          - Commonly know as an equalizer
          - Used adaptively to suppress unwanted frequencies
6. Speech Enhancement

• Pre-Filtering
  - Frequency based

  - Filter banks
          - Commonly know as an equalizer
          - Used adaptively to suppress unwanted frequencies
          - Boost low-end lost due to telephone coding
6. Speech Enhancement

• Pre-Filtering
  - Frequency based

  - Filter banks
          - Commonly know as an equalizer
          - Used adaptively to suppress unwanted frequencies
          - Boost low-end lost due to telephone coding
          - Improve audibility
6. Speech Enhancement
6. Speech Enhancement

• Noise-Filtering
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
        - Stationary (non-adaptive)
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
        - Stationary (non-adaptive)
        - Uses deconvolution to remove noise
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
        - Stationary (non-adaptive)
        - Uses deconvolution to remove noise
  - Signal Subspace approach
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
        - Stationary (non-adaptive)
        - Uses deconvolution to remove noise
  - Signal Subspace approach
         - Represents noise and original signal in “layers”
6. Speech Enhancement

• Noise-Filtering
  - Spectral Substraction
         - Simple and effective
         - Uses the amplitude of the noise
         - “Underwater” effect if overused
  - Wiener Filtering
        - Invented in 1940’s by Norbert Wiener
        - Uses Fourier transform to detect noise
        - Stationary (non-adaptive)
        - Uses deconvolution to remove noise
  - Signal Subspace approach
         - Represents noise and original signal in “layers”
         - Assigns vectors to high and low amplitudes
6. Speech Enhancement
6. Speech Enhancement

• Spectral Restoration
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
 - Works on a small scale
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
 - Works on a small scale
 - Adds filtered full band noise in the gaps
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
 - Works on a small scale
 - Adds filtered full band noise in the gaps
 - Listener perceives the signal as whole
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
 - Works on a small scale
 - Adds filtered full band noise in the gaps
 - Listener perceives the signal as whole

 - Bad results with SREs
6. Speech Enhancement

• Spectral Restoration
 - Fixes dropouts in the signal.
 - Works on a small scale
 - Adds filtered full band noise in the gaps
 - Listener perceives the signal as whole

 - Bad results with SREs
 - Most SREs can fill the gap in a different way
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
7. Speech Recognition Engine
                    Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
7. Speech Recognition Engine
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video

 - Compares speech to a speech database
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video

 - Compares speech to a speech database
 - Needs training most of the time
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video

 - Compares speech to a speech database
 - Needs training most of the time
 - Does not use phonemes
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video

 - Compares speech to a speech database
 - Needs training most of the time
 - Does not use phonemes
 - Uses interval-based vectors.
7. Speech Recognition Engine

• Dynamic Time Warping (DTW)
 - Mostly used in the early days
 - Fast & simple but not accurate with complex speech

 - Measures similarities in time and speed
 - e.g. A video is played twice. One time fast and one time slow. A DTW
 based algorithm will see that it is the same video

 - Compares speech to a speech database
 - Needs training most of the time
 - Does not use phonemes
 - Uses interval-based vectors.
 - Vector taken at the wrong time = bad representation
7. Speech Recognition Engine
7. Speech Recognition Engine

• Statistically Based Speech Recognition
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
 - Allows use by people with different accents / dialects
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
 - Allows use by people with different accents / dialects

 - Markov Model: “predict” the future by knowing the current state
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
 - Allows use by people with different accents / dialects

 - Markov Model: “predict” the future by knowing the current state
 - Hidden Markov model: “predict” the current state by knowing the future
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
 - Allows use by people with different accents / dialects

 - Markov Model: “predict” the future by knowing the current state
 - Hidden Markov model: “predict” the current state by knowing the future

 - Future = grammar file
7. Speech Recognition Engine

• Statistically Based Speech Recognition
 Hidden Markov Models

 - Heart and soul of statistically based SREs
 - Allows use by people with different accents / dialects

 - Markov Model: “predict” the future by knowing the current state
 - Hidden Markov model: “predict” the current state by knowing the future

 - Future = grammar file
 - Statistically rules out possibilities as the word progresses
7. Speech Recognition Engine

• Statistically Based Speech Recognition
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Acoustic Model
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Acoustic Model

  - Gathers statistical information for the HMM
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Acoustic Model

  - Gathers statistical information for the HMM
  - Does this by analyzing a speech corpus (read or continuous)
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Acoustic Model

  - Gathers statistical information for the HMM
  - Does this by analyzing a speech corpus (read or continuous)
  - Different corpus (language, gender, frequency range)
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Acoustic Model

  - Gathers statistical information for the HMM
  - Does this by analyzing a speech corpus (read or continuous)
  - Different corpus (language, gender, frequency range)

  - ISIP Switchboard corpus: 240h of speech, 500 talkers. Telephone quality
7. Speech Recognition Engine

• Statistically Based Speech Recognition
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Language Model
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Language Model

  - Tries to predict the next word
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Language Model

  - Tries to predict the next word
  - Uses a grammar file
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Language Model

  - Tries to predict the next word
  - Uses a grammar file
  - E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve”
7. Speech Recognition Engine

• Statistically Based Speech Recognition
  Language Model

  - Tries to predict the next word
  - Uses a grammar file
  - E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve”

  - Multiple can be combined to predict entire sentences
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
8. Speech Analytics
                     Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
8. Speech Analytics
8. Speech Analytics


- Separate engine
8. Speech Analytics


- Separate engine
- Analyze gender, age, identity and topic discussed
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
 - Analyzes audio as soon as it enters the signal
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
 - Analyzes audio as soon as it enters the signal
 - Useful with background noise
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
 - Analyzes audio as soon as it enters the signal
 - Useful with background noise
 - Matches source to a speech database
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
 - Analyzes audio as soon as it enters the signal
 - Useful with background noise
 - Matches source to a speech database

 e.g.: Emotion detection with customer services
8. Speech Analytics


 - Separate engine
 - Analyze gender, age, identity and topic discussed


• Audio Mining
 - Analyzes audio as soon as it enters the signal
 - Useful with background noise
 - Matches source to a speech database

 e.g.: Emotion detection with customer services
      Music recognition software (“Shazam”, “Soundhound”)
8. Speech Analytics
8. Speech Analytics

• Keyword Spotting
8. Speech Analytics

• Keyword Spotting
  2 kinds:
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
       - non-spontaneous
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
       - non-spontaneous
       - user knows he is talking to an SRE
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
       - non-spontaneous
       - user knows he is talking to an SRE

  Unconstrained spotting
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
       - non-spontaneous
       - user knows he is talking to an SRE

  Unconstrained spotting
     - continuous speech KWS
8. Speech Analytics

• Keyword Spotting
  2 kinds:

  Isolated word
       - clearly enforced breaks
       - non-spontaneous
       - user knows he is talking to an SRE

  Unconstrained spotting
     - continuous speech KWS
     - difficult due to speech segmentation
8. Speech Analytics

• Keyword Spotting
8. Speech Analytics

• Keyword Spotting
    2 methods
8. Speech Analytics

• Keyword Spotting
    2 methods
      - filler method (garbage-method): entire string of speech is analyzed
8. Speech Analytics

• Keyword Spotting
    2 methods
      - filler method (garbage-method): entire string of speech is analyzed
                                       excess words too (=garbage)
8. Speech Analytics

• Keyword Spotting
    2 methods
      - filler method (garbage-method): entire string of speech is analyzed
                                       excess words too (=garbage)

      - sliding model: interval based analyzing
8. Speech Analytics

• Keyword Spotting
    2 methods
      - filler method (garbage-method): entire string of speech is analyzed
                                       excess words too (=garbage)

      - sliding model: interval based analyzing
                       uses Hidden Markov Models & grammar file
8. Speech Analytics

• Keyword Spotting
    2 methods
      - filler method (garbage-method): entire string of speech is analyzed
                                       excess words too (=garbage)

      - sliding model: interval based analyzing
                       uses Hidden Markov Models & grammar file
                       resource intensive
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
Overview


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
9.Overview
                      Conclusion


1.   Hypothesis
2.   Historic Overview
3.   Human Speech Organ
4.   Phonetics & Speech Perception
5.   Telephone Speech Coding & Compression
6.   Speech Enhancement
7.   Speech Recognition Engine
8.   Speech Analytics
9.   Conclusion
9. Conclusion
9. Conclusion


“Is it possible, with today’s known technology,
 to automatically trigger a recording device
       with a random word in a sentence
            over a telephone line?”
9. Conclusion


“Is it possible, with today’s known technology,
 to automatically trigger a recording device
       with a random word in a sentence
            over a telephone line?”

                  Answer:
9. Conclusion


“Is it possible, with today’s known technology,
 to automatically trigger a recording device
       with a random word in a sentence
            over a telephone line?”

                  Answer:

                   YES
9. Conclusion
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source

- Grammar file? → Maybe but will be big
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source

- Grammar file? → Maybe but will be big
- Normal speech corpus? → A lot of pre-filtering / might nog be successful
9. Conclusion

- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source

- Grammar file? → Maybe but will be big
- Normal speech corpus? → A lot of pre-filtering / might nog be successful
- LPC? → artifacts in output due to 2x LPC filtering
Q&A

Mais conteúdo relacionado

Último

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 

Último (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Jesse Sampermans Research Project Presentation

  • 1. Continuous Speech Keyword Spotting In by Jesse Sampermans (502400)
  • 2.
  • 6. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ
  • 7. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception
  • 8. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression
  • 9. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement
  • 10. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine
  • 11. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics
  • 12. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 13. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 14. 1.Overview Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 16. 1. Hypothesis “Is it possible, with today’s known technology, to automatically trigger a recording device with a random word in a sentence over a telephone line?”
  • 17. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 18. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 19. Overview 2. Historic Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 21. 2. Historic Overview • Early Days (1700 - 1900)
  • 22. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer
  • 23. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer - late 1700’s: Russian professor Christian Kratzenstein
  • 24. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer - late 1700’s: Russian professor Christian Kratzenstein - Resonant tube attached to pipe organ
  • 25. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer Enhancement - late 1700’s: Russian professor Christian Kratzenstein - Resonant tube attached to pipe organ
  • 26. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer Enhancement - late 1700’s: Russian professor - mid 1800’s: Charles Christian Kratzenstein Wheatstone - Resonant tube attached to pipe organ
  • 27. 2. Historic Overview • Early Days (1700 - 1900) First artificial speech synthesizer Enhancement - late 1700’s: Russian professor - mid 1800’s: Charles Christian Kratzenstein Wheatstone - Resonant tube attached to pipe - Replace tubes with leather organ resonators
  • 29. 2. Historic Overview • Early Days (1700 - 1900)
  • 30. 2. Historic Overview • Early Days (1700 - 1900) 1881: Gramophone
  • 31. 2. Historic Overview • Early Days (1700 - 1900) 1881: Gramophone Alexander Graham Bell
  • 32. 2. Historic Overview • Early Days (1700 - 1900) 1881: Gramophone Alexander Graham Bell - Dictation purposes
  • 33. 2. Historic Overview • Early Days (1700 - 1900)
  • 34. 2. Historic Overview • Early Days (1700 - 1900) 1939 World Fair: VODER
  • 35. 2. Historic Overview • Early Days (1700 - 1900) 1939 World Fair: VODER Homer Dudley
  • 36. 2. Historic Overview • Early Days (1700 - 1900) 1939 World Fair: VODER Homer Dudley - Based on Wheatstone Resonator
  • 37. 2. Historic Overview • Early Days (1700 - 1900) 1939 World Fair: VODER Homer Dudley - Based on Wheatstone Resonator - Electrical & Mechanical Parts
  • 38. 2. Historic Overview • First Speech Recognizers (1950 - 1980)
  • 39. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Vs.
  • 40. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Vs. - Digit Recognition System based on speech formants
  • 41. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Vs. - Digit Recognition System - 10 syllable recognizer based on speech formants
  • 42. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Vs. - Digit Recognition System - 10 syllable recognizer based on speech formants - Dynamic Time Warping
  • 43. 2. Historic Overview • First Speech Recognizers (1950 - 1980)
  • 44. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s
  • 45. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs.
  • 46. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation
  • 47. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation - Voice typewriter
  • 48. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation - Voice typewriter - Trained databases
  • 49. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation Telecom Automation - Voice typewriter - Trained databases
  • 50. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation Telecom Automation - Voice typewriter - Keyword Spotting - Trained databases
  • 51. 2. Historic Overview • First Speech Recognizers (1950 - 1980) Commercialization 1960’s Vs. Office Automation Telecom Automation - Voice typewriter - Keyword Spotting - Trained databases - Large Audience
  • 53. 2. Historic Overview • Modern evolutions (1980 - ...)
  • 54. 2. Historic Overview • Modern evolutions (1980 - ...) - Hidden Markov Models
  • 55. 2. Historic Overview • Modern evolutions (1980 - ...) - Hidden Markov Models - CMU “Sphynx” = commercial success
  • 56. 2. Historic Overview • Modern evolutions (1980 - ...) - Hidden Markov Models - CMU “Sphynx” = commercial success - DARPA (Defense Advances Research Projects Agency) investments
  • 57. 2. Historic Overview • Modern evolutions (1980 - ...) - Hidden Markov Models - CMU “Sphynx” = commercial success - DARPA (Defense Advances Research Projects Agency) investments - Battle Management
  • 58. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 59. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 60. 3. Human Speech Organ Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 63. 3. Human Speech Organ - Lungs: pump air
  • 64. 3. Human Speech Organ - Lungs: pump air - Larynx (Vocal Folds)
  • 65. 3. Human Speech Organ - Lungs: pump air - Larynx (Vocal Folds) - Articulators (Tongue, Lips, ...)
  • 66. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 67. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 68. 4. Phonetics & Speech Perception Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 69. 4. Phonetics & Speech Perception
  • 70. 4. Phonetics & Speech Perception • Phonetics
  • 71. 4. Phonetics & Speech Perception • Phonetics
  • 72. 4. Phonetics & Speech Perception • Phonetics - Smallest part of human speech
  • 73. 4. Phonetics & Speech Perception • Phonetics - Smallest part of human speech - Originated in India around 2500 BC
  • 74. 4. Phonetics & Speech Perception • Phonetics - Smallest part of human speech - Originated in India around 2500 BC - IPA (International Phonetic Alphabet)
  • 75. 4. Phonetics & Speech Perception • Phonetics - Smallest part of human speech - Originated in India around 2500 BC - IPA (International Phonetic Alphabet) - 44 phonemes in American English
  • 76. 4. Phonetics & Speech Perception
  • 77. 4. Phonetics & Speech Perception • Speech Perception
  • 78. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues
  • 79. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms)
  • 80. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms)
  • 81. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms)
  • 82. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation
  • 83. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic)
  • 84. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic) [k] in “kit” and “caught” and [i] in “kit” and “kick”
  • 85. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic) [k] in “kit” and “caught” and [i] in “kit” and “kick” - Categorical Perception
  • 86. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic) [k] in “kit” and “caught” and [i] in “kit” and “kick” - Categorical Perception Identifying words from different speakers
  • 87. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic) [k] in “kit” and “caught” and [i] in “kit” and “kick” - Categorical Perception Identifying words from different speakers Categorize phonemes in brain
  • 88. 4. Phonetics & Speech Perception • Speech Perception - Acoustic Cues Voice Onset Time: Unaspirated plosives (near 0 ms) Aspirated plosives (> 30 ms) Voiced plosives (< 0 ms) - Speech Segmentation Identifying boundaries between words (lexical) or phonemes (phonetic) [k] in “kit” and “caught” and [i] in “kit” and “kick” - Categorical Perception Identifying words from different speakers Categorize phonemes in brain Only native speakers
  • 89. 4. Phonetics & Speech Perception • Speech Perception
  • 90. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech
  • 91. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme
  • 92. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture”
  • 93. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech
  • 94. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech Fast → Shorter vowels, less pronounced stops, bad articulation
  • 95. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech Fast → Shorter vowels, less pronounced stops, bad articulation Speaker identity
  • 96. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech Fast → Shorter vowels, less pronounced stops, bad articulation Speaker identity - Gender and age differences
  • 97. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech Fast → Shorter vowels, less pronounced stops, bad articulation Speaker identity - Gender and age differences - Vocal chord size and hormone levels
  • 98. 4. Phonetics & Speech Perception • Speech Perception - Variations in speech Phonetic environment can alter the sound of a phoneme [o] in “Bob” and [u] in “vulture” Speed of speech Fast → Shorter vowels, less pronounced stops, bad articulation Speaker identity - Gender and age differences - Vocal chord size and hormone levels - Place of birth
  • 99. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 100. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 101. 5. Telephone Speech Coding & Compression Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 102. 5. Telephone Speech Coding & Compression
  • 103. 5. Telephone Speech Coding & Compression • Early days: Analog
  • 104. 5. Telephone Speech Coding & Compression • Early days: Analog - Speech converted to control voltage in the phone
  • 105. 5. Telephone Speech Coding & Compression • Early days: Analog - Speech converted to control voltage in the phone - Passed through copper lines → crosstalk
  • 106. 5. Telephone Speech Coding & Compression • Early days: Analog - Speech converted to control voltage in the phone - Passed through copper lines → crosstalk • 1980’s - present day: Digital
  • 107. 5. Telephone Speech Coding & Compression • Early days: Analog - Speech converted to control voltage in the phone - Passed through copper lines → crosstalk • 1980’s - present day: Digital - Main advantages: Longer distance / greater speed / less carrier noise
  • 108. 5. Telephone Speech Coding & Compression • Early days: Analog - Speech converted to control voltage in the phone - Passed through copper lines → crosstalk • 1980’s - present day: Digital - Main advantages: Longer distance / greater speed / less carrier noise - Use of Optic Fiber lines → no crosstalk
  • 109. 5. Telephone Speech Coding & Compression
  • 110. 5. Telephone Speech Coding & Compression • Now: Mobile Phones
  • 111. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech
  • 112. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data
  • 113. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz
  • 114. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
  • 115. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR • Technique: Linear Predictive Coding (LPC)
  • 116. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR • Technique: Linear Predictive Coding (LPC) - Formants (human resonance) are removed from speech
  • 117. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR • Technique: Linear Predictive Coding (LPC) - Formants (human resonance) are removed from speech - What is left = sine wave → digitized with Fourier transform
  • 118. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR • Technique: Linear Predictive Coding (LPC) - Formants (human resonance) are removed from speech - What is left = sine wave → digitized with Fourier transform - Formants are synthesized again in the receivers cellphone
  • 119. 5. Telephone Speech Coding & Compression • Now: Mobile Phones - GSM: Speech - UMTS: Data - Frequency content of 3100 kHz - Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR • Technique: Linear Predictive Coding (LPC) - Formants (human resonance) are removed from speech - What is left = sine wave → digitized with Fourier transform - Formants are synthesized again in the receivers cellphone - Of great interest for speech recognition
  • 120. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 121. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 122. 6. Speech Enhancement Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 124. 6. Speech Enhancement • Pre-Filtering
  • 125. 6. Speech Enhancement • Pre-Filtering - Frequency based
  • 126. 6. Speech Enhancement • Pre-Filtering - Frequency based - Filter banks
  • 127. 6. Speech Enhancement • Pre-Filtering - Frequency based - Filter banks - Commonly know as an equalizer
  • 128. 6. Speech Enhancement • Pre-Filtering - Frequency based - Filter banks - Commonly know as an equalizer - Used adaptively to suppress unwanted frequencies
  • 129. 6. Speech Enhancement • Pre-Filtering - Frequency based - Filter banks - Commonly know as an equalizer - Used adaptively to suppress unwanted frequencies - Boost low-end lost due to telephone coding
  • 130. 6. Speech Enhancement • Pre-Filtering - Frequency based - Filter banks - Commonly know as an equalizer - Used adaptively to suppress unwanted frequencies - Boost low-end lost due to telephone coding - Improve audibility
  • 132. 6. Speech Enhancement • Noise-Filtering
  • 133. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction
  • 134. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective
  • 135. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise
  • 136. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused
  • 137. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering
  • 138. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener
  • 139. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise
  • 140. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise - Stationary (non-adaptive)
  • 141. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise - Stationary (non-adaptive) - Uses deconvolution to remove noise
  • 142. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise - Stationary (non-adaptive) - Uses deconvolution to remove noise - Signal Subspace approach
  • 143. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise - Stationary (non-adaptive) - Uses deconvolution to remove noise - Signal Subspace approach - Represents noise and original signal in “layers”
  • 144. 6. Speech Enhancement • Noise-Filtering - Spectral Substraction - Simple and effective - Uses the amplitude of the noise - “Underwater” effect if overused - Wiener Filtering - Invented in 1940’s by Norbert Wiener - Uses Fourier transform to detect noise - Stationary (non-adaptive) - Uses deconvolution to remove noise - Signal Subspace approach - Represents noise and original signal in “layers” - Assigns vectors to high and low amplitudes
  • 146. 6. Speech Enhancement • Spectral Restoration
  • 147. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal.
  • 148. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal. - Works on a small scale
  • 149. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal. - Works on a small scale - Adds filtered full band noise in the gaps
  • 150. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal. - Works on a small scale - Adds filtered full band noise in the gaps - Listener perceives the signal as whole
  • 151. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal. - Works on a small scale - Adds filtered full band noise in the gaps - Listener perceives the signal as whole - Bad results with SREs
  • 152. 6. Speech Enhancement • Spectral Restoration - Fixes dropouts in the signal. - Works on a small scale - Adds filtered full band noise in the gaps - Listener perceives the signal as whole - Bad results with SREs - Most SREs can fill the gap in a different way
  • 153. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 154. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 155. 7. Speech Recognition Engine Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 157. 7. Speech Recognition Engine • Dynamic Time Warping (DTW)
  • 158. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days
  • 159. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech
  • 160. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed
  • 161. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video
  • 162. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video - Compares speech to a speech database
  • 163. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video - Compares speech to a speech database - Needs training most of the time
  • 164. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video - Compares speech to a speech database - Needs training most of the time - Does not use phonemes
  • 165. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video - Compares speech to a speech database - Needs training most of the time - Does not use phonemes - Uses interval-based vectors.
  • 166. 7. Speech Recognition Engine • Dynamic Time Warping (DTW) - Mostly used in the early days - Fast & simple but not accurate with complex speech - Measures similarities in time and speed - e.g. A video is played twice. One time fast and one time slow. A DTW based algorithm will see that it is the same video - Compares speech to a speech database - Needs training most of the time - Does not use phonemes - Uses interval-based vectors. - Vector taken at the wrong time = bad representation
  • 168. 7. Speech Recognition Engine • Statistically Based Speech Recognition
  • 169. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models
  • 170. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs
  • 171. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs - Allows use by people with different accents / dialects
  • 172. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs - Allows use by people with different accents / dialects - Markov Model: “predict” the future by knowing the current state
  • 173. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs - Allows use by people with different accents / dialects - Markov Model: “predict” the future by knowing the current state - Hidden Markov model: “predict” the current state by knowing the future
  • 174. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs - Allows use by people with different accents / dialects - Markov Model: “predict” the future by knowing the current state - Hidden Markov model: “predict” the current state by knowing the future - Future = grammar file
  • 175. 7. Speech Recognition Engine • Statistically Based Speech Recognition Hidden Markov Models - Heart and soul of statistically based SREs - Allows use by people with different accents / dialects - Markov Model: “predict” the future by knowing the current state - Hidden Markov model: “predict” the current state by knowing the future - Future = grammar file - Statistically rules out possibilities as the word progresses
  • 176. 7. Speech Recognition Engine • Statistically Based Speech Recognition
  • 177. 7. Speech Recognition Engine • Statistically Based Speech Recognition Acoustic Model
  • 178. 7. Speech Recognition Engine • Statistically Based Speech Recognition Acoustic Model - Gathers statistical information for the HMM
  • 179. 7. Speech Recognition Engine • Statistically Based Speech Recognition Acoustic Model - Gathers statistical information for the HMM - Does this by analyzing a speech corpus (read or continuous)
  • 180. 7. Speech Recognition Engine • Statistically Based Speech Recognition Acoustic Model - Gathers statistical information for the HMM - Does this by analyzing a speech corpus (read or continuous) - Different corpus (language, gender, frequency range)
  • 181. 7. Speech Recognition Engine • Statistically Based Speech Recognition Acoustic Model - Gathers statistical information for the HMM - Does this by analyzing a speech corpus (read or continuous) - Different corpus (language, gender, frequency range) - ISIP Switchboard corpus: 240h of speech, 500 talkers. Telephone quality
  • 182. 7. Speech Recognition Engine • Statistically Based Speech Recognition
  • 183. 7. Speech Recognition Engine • Statistically Based Speech Recognition Language Model
  • 184. 7. Speech Recognition Engine • Statistically Based Speech Recognition Language Model - Tries to predict the next word
  • 185. 7. Speech Recognition Engine • Statistically Based Speech Recognition Language Model - Tries to predict the next word - Uses a grammar file
  • 186. 7. Speech Recognition Engine • Statistically Based Speech Recognition Language Model - Tries to predict the next word - Uses a grammar file - E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve”
  • 187. 7. Speech Recognition Engine • Statistically Based Speech Recognition Language Model - Tries to predict the next word - Uses a grammar file - E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve” - Multiple can be combined to predict entire sentences
  • 188. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 189. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 190. 8. Speech Analytics Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 192. 8. Speech Analytics - Separate engine
  • 193. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed
  • 194. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining
  • 195. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining - Analyzes audio as soon as it enters the signal
  • 196. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining - Analyzes audio as soon as it enters the signal - Useful with background noise
  • 197. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining - Analyzes audio as soon as it enters the signal - Useful with background noise - Matches source to a speech database
  • 198. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining - Analyzes audio as soon as it enters the signal - Useful with background noise - Matches source to a speech database e.g.: Emotion detection with customer services
  • 199. 8. Speech Analytics - Separate engine - Analyze gender, age, identity and topic discussed • Audio Mining - Analyzes audio as soon as it enters the signal - Useful with background noise - Matches source to a speech database e.g.: Emotion detection with customer services Music recognition software (“Shazam”, “Soundhound”)
  • 201. 8. Speech Analytics • Keyword Spotting
  • 202. 8. Speech Analytics • Keyword Spotting 2 kinds:
  • 203. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word
  • 204. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks
  • 205. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks - non-spontaneous
  • 206. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks - non-spontaneous - user knows he is talking to an SRE
  • 207. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks - non-spontaneous - user knows he is talking to an SRE Unconstrained spotting
  • 208. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks - non-spontaneous - user knows he is talking to an SRE Unconstrained spotting - continuous speech KWS
  • 209. 8. Speech Analytics • Keyword Spotting 2 kinds: Isolated word - clearly enforced breaks - non-spontaneous - user knows he is talking to an SRE Unconstrained spotting - continuous speech KWS - difficult due to speech segmentation
  • 210. 8. Speech Analytics • Keyword Spotting
  • 211. 8. Speech Analytics • Keyword Spotting 2 methods
  • 212. 8. Speech Analytics • Keyword Spotting 2 methods - filler method (garbage-method): entire string of speech is analyzed
  • 213. 8. Speech Analytics • Keyword Spotting 2 methods - filler method (garbage-method): entire string of speech is analyzed excess words too (=garbage)
  • 214. 8. Speech Analytics • Keyword Spotting 2 methods - filler method (garbage-method): entire string of speech is analyzed excess words too (=garbage) - sliding model: interval based analyzing
  • 215. 8. Speech Analytics • Keyword Spotting 2 methods - filler method (garbage-method): entire string of speech is analyzed excess words too (=garbage) - sliding model: interval based analyzing uses Hidden Markov Models & grammar file
  • 216. 8. Speech Analytics • Keyword Spotting 2 methods - filler method (garbage-method): entire string of speech is analyzed excess words too (=garbage) - sliding model: interval based analyzing uses Hidden Markov Models & grammar file resource intensive
  • 217. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 218. Overview 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 219. 9.Overview Conclusion 1. Hypothesis 2. Historic Overview 3. Human Speech Organ 4. Phonetics & Speech Perception 5. Telephone Speech Coding & Compression 6. Speech Enhancement 7. Speech Recognition Engine 8. Speech Analytics 9. Conclusion
  • 221. 9. Conclusion “Is it possible, with today’s known technology, to automatically trigger a recording device with a random word in a sentence over a telephone line?”
  • 222. 9. Conclusion “Is it possible, with today’s known technology, to automatically trigger a recording device with a random word in a sentence over a telephone line?” Answer:
  • 223. 9. Conclusion “Is it possible, with today’s known technology, to automatically trigger a recording device with a random word in a sentence over a telephone line?” Answer: YES
  • 225. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE
  • 226. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE - Appropriate acoustic model
  • 227. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE - Appropriate acoustic model - ISIP Switchboard speech corpus: telephone compressed source
  • 228. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE - Appropriate acoustic model - ISIP Switchboard speech corpus: telephone compressed source - Grammar file? → Maybe but will be big
  • 229. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE - Appropriate acoustic model - ISIP Switchboard speech corpus: telephone compressed source - Grammar file? → Maybe but will be big - Normal speech corpus? → A lot of pre-filtering / might nog be successful
  • 230. 9. Conclusion - Keyword spotting algorithm based on a statistically based SRE - Appropriate acoustic model - ISIP Switchboard speech corpus: telephone compressed source - Grammar file? → Maybe but will be big - Normal speech corpus? → A lot of pre-filtering / might nog be successful - LPC? → artifacts in output due to 2x LPC filtering
  • 231.
  • 232. Q&A

Notas do Editor

  1. \n
  2. - Complicated research\n\n- Start with core of research\n- Research question\n
  3. - Complicated research\n\n- Start with core of research\n- Research question\n
  4. - Complicated research\n\n- Start with core of research\n- Research question\n
  5. - Complicated research\n\n- Start with core of research\n- Research question\n
  6. - Complicated research\n\n- Start with core of research\n- Research question\n
  7. - Complicated research\n\n- Start with core of research\n- Research question\n
  8. - Complicated research\n\n- Start with core of research\n- Research question\n
  9. - Complicated research\n\n- Start with core of research\n- Research question\n
  10. - Complicated research\n\n- Start with core of research\n- Research question\n
  11. - Complicated research\n\n- Start with core of research\n- Research question\n
  12. \n
  13. \n
  14. \n
  15. \n
  16. - Ask opinion\n\n- Idea: CSI\n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  24. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  25. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  26. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  27. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  28. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  29. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  30. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  31. - Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -&gt; Differen vowel\n\n
  32. - Reed: represents vocal folds\n- Resonator changed by hand: Different vowels\n
  33. - Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
  34. - Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
  35. - Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
  36. - Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
  37. - Oscillator or Noise as source\n\n- Frequency -&gt; controlled by foot pedal\n- Hands -&gt; control band pass filters\n\n- Idea -&gt; Importance of signal spectrum for speech representation\n
  38. - Oscillator or Noise as source\n\n- Frequency -&gt; controlled by foot pedal\n- Hands -&gt; control band pass filters\n\n- Idea -&gt; Importance of signal spectrum for speech representation\n
  39. - Oscillator or Noise as source\n\n- Frequency -&gt; controlled by foot pedal\n- Hands -&gt; control band pass filters\n\n- Idea -&gt; Importance of signal spectrum for speech representation\n
  40. - Oscillator or Noise as source\n\n- Frequency -&gt; controlled by foot pedal\n- Hands -&gt; control band pass filters\n\n- Idea -&gt; Importance of signal spectrum for speech representation\n
  41. - Oscillator or Noise as source\n\n- Frequency -&gt; controlled by foot pedal\n- Hands -&gt; control band pass filters\n\n- Idea -&gt; Importance of signal spectrum for speech representation\n
  42. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  43. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  44. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  45. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  46. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  47. - Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
  48. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  49. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  50. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  51. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  52. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  53. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  54. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  55. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  56. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  57. - 60&amp;#x2019;s -&gt; Commercialization\n\n- IBM: Office market\n\n- AT&amp;T: prvsly Bell labs\nTelecom automation\nHelp desks\nKeyword spotting: natural\n\n
  58. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  59. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  60. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  61. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  62. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  63. - Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. - Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
  71. - Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
  72. - Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
  73. - Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  81. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  82. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  83. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  84. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  85. - Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
  86. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  87. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  88. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  89. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  90. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  91. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  92. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  93. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  94. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  95. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  96. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  97. - AC: Defines start &amp; end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n&amp;#x201C;How to recognize speech&amp;#x201D;\n&amp;#x201C;How to wreck a nice beach&amp;#x201D;\n\n- CP: different pronunciations\n&amp;#x201C;sheep&amp;#x201D; and &amp;#x201C;cheap&amp;#x201D;\n
  98. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  99. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  100. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  101. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  102. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  103. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  104. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  105. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  106. - Psychological or Phonetic\n\n- VCsize: Female &amp; child = small\n Male = big = more air\n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. - Fully Analog\n- Manual switchers\n\n
  114. - Fully Analog\n- Manual switchers\n\n
  115. - Fully Analog\n- Manual switchers\n\n
  116. - Fully Analog\n- Manual switchers\n\n
  117. - Fully Analog\n- Manual switchers\n\n
  118. - Fully Analog\n- Manual switchers\n\n
  119. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  120. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  121. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  122. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  123. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  124. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  125. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  126. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  127. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  128. Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
  129. \n
  130. \n
  131. \n
  132. \n
  133. \n
  134. \n
  135. - FB: Filter telephone mid frequency\n
  136. - FB: Filter telephone mid frequency\n
  137. - FB: Filter telephone mid frequency\n
  138. - FB: Filter telephone mid frequency\n
  139. - FB: Filter telephone mid frequency\n
  140. - FB: Filter telephone mid frequency\n
  141. - FB: Filter telephone mid frequency\n
  142. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  143. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  144. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  145. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  146. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  147. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  148. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  149. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  150. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  151. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  152. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  153. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  154. After filtering -&gt; Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
  155. \n
  156. \n
  157. \n
  158. \n
  159. \n
  160. \n
  161. \n
  162. \n
  163. \n
  164. \n
  165. \n
  166. \n
  167. \n
  168. \n
  169. \n
  170. \n
  171. \n
  172. \n
  173. \n
  174. \n
  175. \n
  176. \n
  177. \n
  178. \n
  179. \n
  180. \n
  181. \n
  182. \n
  183. \n
  184. \n
  185. \n
  186. \n
  187. \n
  188. \n
  189. \n
  190. \n
  191. \n
  192. \n
  193. \n
  194. \n
  195. \n
  196. \n
  197. \n
  198. \n
  199. \n
  200. \n
  201. \n
  202. \n
  203. \n
  204. \n
  205. \n
  206. \n
  207. \n
  208. \n
  209. \n
  210. \n
  211. \n
  212. \n
  213. \n
  214. \n
  215. \n
  216. \n
  217. \n
  218. \n
  219. \n
  220. \n
  221. \n
  222. \n
  223. \n
  224. \n
  225. \n
  226. \n
  227. \n
  228. \n
  229. \n
  230. \n
  231. \n
  232. \n
  233. \n
  234. \n
  235. \n
  236. \n
  237. \n
  238. \n
  239. \n
  240. \n