16. 1. Hypothesis
“Is it possible, with today’s known technology,
to automatically trigger a recording device
with a random word in a sentence
over a telephone line?”
23. 2. Historic Overview
• Early Days (1700 - 1900)
First artificial speech synthesizer
- late 1700’s: Russian professor
Christian Kratzenstein
24. 2. Historic Overview
• Early Days (1700 - 1900)
First artificial speech synthesizer
- late 1700’s: Russian professor
Christian Kratzenstein
- Resonant tube attached to pipe
organ
25. 2. Historic Overview
• Early Days (1700 - 1900)
First artificial speech synthesizer Enhancement
- late 1700’s: Russian professor
Christian Kratzenstein
- Resonant tube attached to pipe
organ
26. 2. Historic Overview
• Early Days (1700 - 1900)
First artificial speech synthesizer Enhancement
- late 1700’s: Russian professor - mid 1800’s: Charles
Christian Kratzenstein Wheatstone
- Resonant tube attached to pipe
organ
27. 2. Historic Overview
• Early Days (1700 - 1900)
First artificial speech synthesizer Enhancement
- late 1700’s: Russian professor - mid 1800’s: Charles
Christian Kratzenstein Wheatstone
- Resonant tube attached to pipe - Replace tubes with leather
organ resonators
36. 2. Historic Overview
• Early Days (1700 - 1900)
1939 World Fair: VODER
Homer Dudley
- Based on Wheatstone Resonator
37. 2. Historic Overview
• Early Days (1700 - 1900)
1939 World Fair: VODER
Homer Dudley
- Based on Wheatstone Resonator
- Electrical & Mechanical Parts
40. 2. Historic Overview
• First Speech Recognizers (1950 - 1980)
Vs.
- Digit Recognition System
based on speech formants
41. 2. Historic Overview
• First Speech Recognizers (1950 - 1980)
Vs.
- Digit Recognition System - 10 syllable recognizer
based on speech formants
42. 2. Historic Overview
• First Speech Recognizers (1950 - 1980)
Vs.
- Digit Recognition System - 10 syllable recognizer
based on speech formants - Dynamic Time Warping
72. 4. Phonetics & Speech Perception
• Phonetics
- Smallest part of human speech
73. 4. Phonetics & Speech Perception
• Phonetics
- Smallest part of human speech
- Originated in India around 2500 BC
74. 4. Phonetics & Speech Perception
• Phonetics
- Smallest part of human speech
- Originated in India around 2500 BC
- IPA (International Phonetic
Alphabet)
75. 4. Phonetics & Speech Perception
• Phonetics
- Smallest part of human speech
- Originated in India around 2500 BC
- IPA (International Phonetic
Alphabet)
- 44 phonemes in American English
91. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
92. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
93. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
94. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
Fast → Shorter vowels, less pronounced stops, bad articulation
95. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
Fast → Shorter vowels, less pronounced stops, bad articulation
Speaker identity
96. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
Fast → Shorter vowels, less pronounced stops, bad articulation
Speaker identity
- Gender and age differences
97. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
Fast → Shorter vowels, less pronounced stops, bad articulation
Speaker identity
- Gender and age differences
- Vocal chord size and hormone levels
98. 4. Phonetics & Speech Perception
• Speech Perception
- Variations in speech
Phonetic environment can alter the sound of a phoneme
[o] in “Bob” and [u] in “vulture”
Speed of speech
Fast → Shorter vowels, less pronounced stops, bad articulation
Speaker identity
- Gender and age differences
- Vocal chord size and hormone levels
- Place of birth
104. 5. Telephone Speech Coding & Compression
• Early days: Analog
- Speech converted to control voltage in the phone
105. 5. Telephone Speech Coding & Compression
• Early days: Analog
- Speech converted to control voltage in the phone
- Passed through copper lines → crosstalk
106. 5. Telephone Speech Coding & Compression
• Early days: Analog
- Speech converted to control voltage in the phone
- Passed through copper lines → crosstalk
• 1980’s - present day: Digital
107. 5. Telephone Speech Coding & Compression
• Early days: Analog
- Speech converted to control voltage in the phone
- Passed through copper lines → crosstalk
• 1980’s - present day: Digital
- Main advantages: Longer distance / greater speed / less carrier noise
108. 5. Telephone Speech Coding & Compression
• Early days: Analog
- Speech converted to control voltage in the phone
- Passed through copper lines → crosstalk
• 1980’s - present day: Digital
- Main advantages: Longer distance / greater speed / less carrier noise
- Use of Optic Fiber lines → no crosstalk
112. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
113. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
114. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
115. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
• Technique: Linear Predictive Coding (LPC)
116. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
• Technique: Linear Predictive Coding (LPC)
- Formants (human resonance) are removed from speech
117. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
• Technique: Linear Predictive Coding (LPC)
- Formants (human resonance) are removed from speech
- What is left = sine wave → digitized with Fourier transform
118. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
• Technique: Linear Predictive Coding (LPC)
- Formants (human resonance) are removed from speech
- What is left = sine wave → digitized with Fourier transform
- Formants are synthesized again in the receivers cellphone
119. 5. Telephone Speech Coding & Compression
• Now: Mobile Phones
- GSM: Speech
- UMTS: Data
- Frequency content of 3100 kHz
- Compressed full-rate (13 kbit/s) or half-rate (6,5 kbit/s) with 8kHz SR
• Technique: Linear Predictive Coding (LPC)
- Formants (human resonance) are removed from speech
- What is left = sine wave → digitized with Fourier transform
- Formants are synthesized again in the receivers cellphone
- Of great interest for speech recognition
127. 6. Speech Enhancement
• Pre-Filtering
- Frequency based
- Filter banks
- Commonly know as an equalizer
128. 6. Speech Enhancement
• Pre-Filtering
- Frequency based
- Filter banks
- Commonly know as an equalizer
- Used adaptively to suppress unwanted frequencies
129. 6. Speech Enhancement
• Pre-Filtering
- Frequency based
- Filter banks
- Commonly know as an equalizer
- Used adaptively to suppress unwanted frequencies
- Boost low-end lost due to telephone coding
130. 6. Speech Enhancement
• Pre-Filtering
- Frequency based
- Filter banks
- Commonly know as an equalizer
- Used adaptively to suppress unwanted frequencies
- Boost low-end lost due to telephone coding
- Improve audibility
135. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
136. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
137. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
138. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
139. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
140. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
- Stationary (non-adaptive)
141. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
- Stationary (non-adaptive)
- Uses deconvolution to remove noise
142. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
- Stationary (non-adaptive)
- Uses deconvolution to remove noise
- Signal Subspace approach
143. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
- Stationary (non-adaptive)
- Uses deconvolution to remove noise
- Signal Subspace approach
- Represents noise and original signal in “layers”
144. 6. Speech Enhancement
• Noise-Filtering
- Spectral Substraction
- Simple and effective
- Uses the amplitude of the noise
- “Underwater” effect if overused
- Wiener Filtering
- Invented in 1940’s by Norbert Wiener
- Uses Fourier transform to detect noise
- Stationary (non-adaptive)
- Uses deconvolution to remove noise
- Signal Subspace approach
- Represents noise and original signal in “layers”
- Assigns vectors to high and low amplitudes
148. 6. Speech Enhancement
• Spectral Restoration
- Fixes dropouts in the signal.
- Works on a small scale
149. 6. Speech Enhancement
• Spectral Restoration
- Fixes dropouts in the signal.
- Works on a small scale
- Adds filtered full band noise in the gaps
150. 6. Speech Enhancement
• Spectral Restoration
- Fixes dropouts in the signal.
- Works on a small scale
- Adds filtered full band noise in the gaps
- Listener perceives the signal as whole
151. 6. Speech Enhancement
• Spectral Restoration
- Fixes dropouts in the signal.
- Works on a small scale
- Adds filtered full band noise in the gaps
- Listener perceives the signal as whole
- Bad results with SREs
152. 6. Speech Enhancement
• Spectral Restoration
- Fixes dropouts in the signal.
- Works on a small scale
- Adds filtered full band noise in the gaps
- Listener perceives the signal as whole
- Bad results with SREs
- Most SREs can fill the gap in a different way
158. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
159. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
160. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
161. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
162. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
- Compares speech to a speech database
163. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
- Compares speech to a speech database
- Needs training most of the time
164. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
- Compares speech to a speech database
- Needs training most of the time
- Does not use phonemes
165. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
- Compares speech to a speech database
- Needs training most of the time
- Does not use phonemes
- Uses interval-based vectors.
166. 7. Speech Recognition Engine
• Dynamic Time Warping (DTW)
- Mostly used in the early days
- Fast & simple but not accurate with complex speech
- Measures similarities in time and speed
- e.g. A video is played twice. One time fast and one time slow. A DTW
based algorithm will see that it is the same video
- Compares speech to a speech database
- Needs training most of the time
- Does not use phonemes
- Uses interval-based vectors.
- Vector taken at the wrong time = bad representation
170. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
171. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
- Allows use by people with different accents / dialects
172. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
- Allows use by people with different accents / dialects
- Markov Model: “predict” the future by knowing the current state
173. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
- Allows use by people with different accents / dialects
- Markov Model: “predict” the future by knowing the current state
- Hidden Markov model: “predict” the current state by knowing the future
174. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
- Allows use by people with different accents / dialects
- Markov Model: “predict” the future by knowing the current state
- Hidden Markov model: “predict” the current state by knowing the future
- Future = grammar file
175. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Hidden Markov Models
- Heart and soul of statistically based SREs
- Allows use by people with different accents / dialects
- Markov Model: “predict” the future by knowing the current state
- Hidden Markov model: “predict” the current state by knowing the future
- Future = grammar file
- Statistically rules out possibilities as the word progresses
177. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Acoustic Model
178. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Acoustic Model
- Gathers statistical information for the HMM
179. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Acoustic Model
- Gathers statistical information for the HMM
- Does this by analyzing a speech corpus (read or continuous)
180. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Acoustic Model
- Gathers statistical information for the HMM
- Does this by analyzing a speech corpus (read or continuous)
- Different corpus (language, gender, frequency range)
181. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Acoustic Model
- Gathers statistical information for the HMM
- Does this by analyzing a speech corpus (read or continuous)
- Different corpus (language, gender, frequency range)
- ISIP Switchboard corpus: 240h of speech, 500 talkers. Telephone quality
183. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Language Model
184. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Language Model
- Tries to predict the next word
185. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Language Model
- Tries to predict the next word
- Uses a grammar file
186. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Language Model
- Tries to predict the next word
- Uses a grammar file
- E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve”
187. 7. Speech Recognition Engine
• Statistically Based Speech Recognition
Language Model
- Tries to predict the next word
- Uses a grammar file
- E.g. “Phone Steve Young; Phone Young; Phone Steve; Phone Young Steve”
- Multiple can be combined to predict entire sentences
193. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
194. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
195. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
- Analyzes audio as soon as it enters the signal
196. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
- Analyzes audio as soon as it enters the signal
- Useful with background noise
197. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
- Analyzes audio as soon as it enters the signal
- Useful with background noise
- Matches source to a speech database
198. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
- Analyzes audio as soon as it enters the signal
- Useful with background noise
- Matches source to a speech database
e.g.: Emotion detection with customer services
199. 8. Speech Analytics
- Separate engine
- Analyze gender, age, identity and topic discussed
• Audio Mining
- Analyzes audio as soon as it enters the signal
- Useful with background noise
- Matches source to a speech database
e.g.: Emotion detection with customer services
Music recognition software (“Shazam”, “Soundhound”)
206. 8. Speech Analytics
• Keyword Spotting
2 kinds:
Isolated word
- clearly enforced breaks
- non-spontaneous
- user knows he is talking to an SRE
207. 8. Speech Analytics
• Keyword Spotting
2 kinds:
Isolated word
- clearly enforced breaks
- non-spontaneous
- user knows he is talking to an SRE
Unconstrained spotting
208. 8. Speech Analytics
• Keyword Spotting
2 kinds:
Isolated word
- clearly enforced breaks
- non-spontaneous
- user knows he is talking to an SRE
Unconstrained spotting
- continuous speech KWS
209. 8. Speech Analytics
• Keyword Spotting
2 kinds:
Isolated word
- clearly enforced breaks
- non-spontaneous
- user knows he is talking to an SRE
Unconstrained spotting
- continuous speech KWS
- difficult due to speech segmentation
221. 9. Conclusion
“Is it possible, with today’s known technology,
to automatically trigger a recording device
with a random word in a sentence
over a telephone line?”
222. 9. Conclusion
“Is it possible, with today’s known technology,
to automatically trigger a recording device
with a random word in a sentence
over a telephone line?”
Answer:
223. 9. Conclusion
“Is it possible, with today’s known technology,
to automatically trigger a recording device
with a random word in a sentence
over a telephone line?”
Answer:
YES
226. 9. Conclusion
- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
227. 9. Conclusion
- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source
228. 9. Conclusion
- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source
- Grammar file? → Maybe but will be big
229. 9. Conclusion
- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source
- Grammar file? → Maybe but will be big
- Normal speech corpus? → A lot of pre-filtering / might nog be successful
230. 9. Conclusion
- Keyword spotting algorithm based on a statistically based SRE
- Appropriate acoustic model
- ISIP Switchboard speech corpus: telephone compressed source
- Grammar file? → Maybe but will be big
- Normal speech corpus? → A lot of pre-filtering / might nog be successful
- LPC? → artifacts in output due to 2x LPC filtering
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
- Complicated research\n\n- Start with core of research\n- Research question\n
\n
\n
\n
\n
- Ask opinion\n\n- Idea: CSI\n
\n
\n
\n
\n
\n
\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Early days: focus on Voice Reproduction\n\n- Produce Vowels\n- Different tube size/shape -> Differen vowel\n\n
- Reed: represents vocal folds\n- Resonator changed by hand: Different vowels\n
- Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
- Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
- Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
- Major breakthrough\n\n- Originally used for dictation\n\n- Aimed at office-market\n
- Oscillator or Noise as source\n\n- Frequency -> controlled by foot pedal\n- Hands -> control band pass filters\n\n- Idea -> Importance of signal spectrum for speech representation\n
- Oscillator or Noise as source\n\n- Frequency -> controlled by foot pedal\n- Hands -> control band pass filters\n\n- Idea -> Importance of signal spectrum for speech representation\n
- Oscillator or Noise as source\n\n- Frequency -> controlled by foot pedal\n- Hands -> control band pass filters\n\n- Idea -> Importance of signal spectrum for speech representation\n
- Oscillator or Noise as source\n\n- Frequency -> controlled by foot pedal\n- Hands -> control band pass filters\n\n- Idea -> Importance of signal spectrum for speech representation\n
- Oscillator or Noise as source\n\n- Frequency -> controlled by foot pedal\n- Hands -> control band pass filters\n\n- Idea -> Importance of signal spectrum for speech representation\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Computer popularity\n\n- BELL: Digit: \nonly works separated\nused formant frequencies to detect and compare\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
- Rise of statistically base SRE\n\n- Andrej Markov\n\n- CMU: Used HMM\n\n- DARPA: USA gov organization\n\n
\n
\n
\n
\n
\n
\n
- Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
- Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
- Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
- Lungs: Pump air = fuel\n\n- Larynx: houses vocal folds\n- Vocal folds: no muscle\n- Comp to Reed in clarinet\n- Closed: speech\n- Open: breathing\n- Gap?: big/ small\n\n- Articulators: Shaping / Syllables\n
\n
\n
\n
\n
\n
\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- Lowest level of speech which still form a contrast between sounds\n\n- Noted in an IPA\n\n- Used in the acoustic model of an SRE\n\n\n\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- AC: Defines start & end points of phonemes (VOT)\nStudy articulation\n\nUnaspirated: regular speech\nAspirated: my Macboo[k] is broken\nVoiced: [G]od, [B]ob\n\n- SS: difficult. \n[k] is pronounced different \n“How to recognize speech”\n“How to wreck a nice beach”\n\n- CP: different pronunciations\n“sheep” and “cheap”\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
- Psychological or Phonetic\n\n- VCsize: Female & child = small\n Male = big = more air\n
\n
\n
\n
\n
\n
\n
- Fully Analog\n- Manual switchers\n\n
- Fully Analog\n- Manual switchers\n\n
- Fully Analog\n- Manual switchers\n\n
- Fully Analog\n- Manual switchers\n\n
- Fully Analog\n- Manual switchers\n\n
- Fully Analog\n- Manual switchers\n\n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
Cellphone: Dr. Martin Cooper\n\n- GSM: Global Systems for Mobile Communications\n- UMTS: Universal Mobile Telecommunications System\n\n- LPC: remove buzz from the voice\nSignals carried digitally\nLPC data added in receiving phone \n
\n
\n
\n
\n
\n
\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
- FB: Filter telephone mid frequency\n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n
After filtering -> Noise filtering\n\n- 3 methods\n\n- SS: Used in Ozone\n- Wiener = Linear, non-adaptive\n- Signal Subspace = layers \n