SlideShare a Scribd company logo
1 of 64
Download to read offline
9-10 August, 2002 Computational Audition 1
Fixed Point Representations forFixed Point Representations for
Very High-Quality Speech andVery High-Quality Speech and
Sound Modification SystemSound Modification System
Hideki KawaharaHideki Kawahara
Wakayama University, JapanWakayama University, Japan
9-10 August, 2002 Computational Audition 2
SummarySummary
nn Functional (computational after Marr)Functional (computational after Marr)
approach is important and productive.approach is important and productive.
nn Fixed points provide feature values asFixed points provide feature values as
well as their reliability indices.well as their reliability indices.
–– UsingUsing within channelwithin channel informationinformation
nn Fixed point concept may provide clue toFixed point concept may provide clue to
integrate Fourier based concept andintegrate Fourier based concept and
wavelet-wavelet-MellinMellin transform based concept.transform based concept.
Reference systemReference system
““LocalLocal””
center of gravitycenter of gravity
9-10 August, 2002 Computational Audition 3
original
STRAIGHT: demoSTRAIGHT: demo
9-10 August, 2002 Computational Audition 4
STRAIGHT demo: morphingSTRAIGHT demo: morphing
neutral angry
interpolationextrapolation extrapolation
Word: /hai/ (“Yes” in Japanese)
9-10 August, 2002 Computational Audition 5
BackgroundBackground
nn ““Auditory BrainAuditory Brain”” Project by CRESTProject by CREST
–– Short term goal: speech processing systemsShort term goal: speech processing systems
based on functional models of auditory functions.based on functional models of auditory functions.
–– STRAIGHT: a very high-quality speechSTRAIGHT: a very high-quality speech
manipulation systemmanipulation system
–– Fixed point based algorithmsFixed point based algorithms
(alternative way of dimensional reduction of(alternative way of dimensional reduction of
auditory representations)auditory representations)
–– Long term goal:Long term goal: ““computationalcomputational”” theorirdtheorird ofof
auditionaudition
nn Frustrations in ways how auditory modelsFrustrations in ways how auditory models
are used in ASR and how speech processingare used in ASR and how speech processing
systems are evaluated.systems are evaluated.
9-10 August, 2002 Computational Audition 6
““Auditory BrainAuditory Brain”” ProjectProject
nn To develop a very high quality speechTo develop a very high quality speech
and/or sound manipulation system basedand/or sound manipulation system based
on perceptually relevant parameters andon perceptually relevant parameters and
it does not preserve phase/waveformit does not preserve phase/waveform
information.information.
–– ?? Are distance based quality measures relevant???? Are distance based quality measures relevant??
–– ?? Why does periodic sound sounds smoother and?? Why does periodic sound sounds smoother and
richer (in Auditory Fovea)??richer (in Auditory Fovea)??
–– ?? Is it relevant to test highly nonlinear speech?? Is it relevant to test highly nonlinear speech
perception using elementary sounds ??perception using elementary sounds ??
9-10 August, 2002 Computational Audition 7
Why high quality?Why high quality?
nn Ecological approach for investigatingEcological approach for investigating
highly nonlinear system, Humanhighly nonlinear system, Human
Not necessarily be predictable
from elementary test signals
Necessary to use
ecologically valid stimuli
Naturalness
9-10 August, 2002 Computational Audition 8
Hans Moravec: Robot, 2000, Oxford
xbox
iMac
PlayStation-3
Key issue: compatibilityKey issue: compatibility
Background figure is removed.
Please visit Hans Moravec’s page for
the original figure.
Faster than exponential growth in computing power
(Chapter 3: Power and Presence, Page 60)
http://www.frc.ri.cmu.edu/~hpm/book98/
9-10 August, 2002 Computational Audition 9
““Auditory BrainAuditory Brain”” ProjectProject
nn Computational theories of speech/auditoryComputational theories of speech/auditory
perceptionperception
–– ecological constraints on evolutionecological constraints on evolution
–– It cannot be ad hoc.It cannot be ad hoc.
»» When there is an elegant and reasonable algorithmWhen there is an elegant and reasonable algorithm
and it does not violate ecological (biological andand it does not violate ecological (biological and
environmental) constraints, there is no reason toenvironmental) constraints, there is no reason to
deny that the algorithm shares the commondeny that the algorithm shares the common
underlying principles with our auditory system.underlying principles with our auditory system.
9-10 August, 2002 Computational Audition 10
““Auditory BrainAuditory Brain”” ProjectProject
nn Computational theories of speech/auditoryComputational theories of speech/auditory
perceptionperception
–– Periodicity: time-frequency sampling gridPeriodicity: time-frequency sampling grid
–– Periodicity: stable reference point for wavelet-Periodicity: stable reference point for wavelet-
MellinMellin transformtransform
–– Log-linear frequency axisLog-linear frequency axis
»» Wavelet-Wavelet-MellinMellin transform: shape and sizetransform: shape and size
–– Why two ears? ICAWhy two ears? ICA
–– Long term correlation (structure)Long term correlation (structure)
»» ASR, musicASR, music
9-10 August, 2002 Computational Audition 11
STRAIGHT a core technologySTRAIGHT a core technology
nn Conceptually simple architectureConceptually simple architecture
–– Channel VOCODERChannel VOCODER
–– Source filter modelSource filter model
nn Graded parameters (Graded parameters (vsvs binary decision)binary decision)
–– Sensitivity analysisSensitivity analysis
–– MorphingMorphing
nn Reliability / TransparencyReliability / Transparency
–– No post-processingNo post-processing
–– Weakly constrained modelWeakly constrained model
9-10 August, 2002 Computational Audition 12
Structure of STRAIGHT
STRAIGHT: architectureSTRAIGHT: architecture
9-10 August, 2002 Computational Audition 13
STRAIGHT a core technologySTRAIGHT a core technology
nn Conceptually simple architectureConceptually simple architecture
–– Channel VOCODERChannel VOCODER
–– Source filter modelSource filter model
nn Graded parameters (Graded parameters (vsvs binary decision)binary decision)
–– Sensitivity analysisSensitivity analysis
–– MorphingMorphing
nn Reliability / TransparencyReliability / Transparency
–– No post-processingNo post-processing
–– Weakly constrained modelWeakly constrained model
9-10 August, 2002 Computational Audition 14
Structure of STRAIGHT
Spectral envelope estimation
STRAIGHT: structureSTRAIGHT: structure
9-10 August, 2002 Computational Audition 15
Weakly constrained spectralWeakly constrained spectral
envelope estimationenvelope estimation
waveform
Time window
Interferences
in the time domain
Interferences in the
frequency domain
9-10 August, 2002 Computational Audition 16
Weakly constrained spectralWeakly constrained spectral
envelope estimationenvelope estimation
Reduction of edge discontinuity
Reduction of periodicity interference
smoothing by spline basisComposite window
9-10 August, 2002 Computational Audition 17
Time-frequency smoothing
(
Time-frequency smoothing
(current implementation)current implementation)
F0 synchronous
Gaussian window
complimentary
time window
reduced interference
spectrum
F0 synchronous
Gaussian window
complimentary
time window
9-10 August, 2002 Computational Audition 18
Compensation of over-Compensation of over-
smoothingsmoothing
9-10 August, 2002 Computational Audition 19
Compensation of over-smoothingCompensation of over-smoothing
9-10 August, 2002 Computational Audition 20
Weakly constrained spectralWeakly constrained spectral
envelope estimationenvelope estimation
9-10 August, 2002 Computational Audition 21
Weakly constrained spectralWeakly constrained spectral
envelope estimationenvelope estimation
9-10 August, 2002 Computational Audition 22
Fixed point based algorithmsFixed point based algorithms
nn Fixed points in the frequency domain:
→
Fixed points in the frequency domain:
→ F0 extractionF0 extraction
nn Fixed points in the time domain:
→
Fixed points in the time domain:
→ Excitation extractionExcitation extraction
9-10 August, 2002 Computational Audition 23
Fixed point of mappingFixed point of mapping
fixed point
y
x
y=f(x)
* Instantaneous frequency
of a filter output around
a sinusoidal component
* Energy centroid
of a windowed signal
around an event
Examples
9-10 August, 2002 Computational Audition 24
Averaging and fixed pointAveraging and fixed point
nn Prominent componentProminent component
Window
locations
Average of windowed
value
Fixed point
background
9-10 August, 2002 Computational Audition 25
Averaging and fixed pointAveraging and fixed point
nn Prominent componentProminent component
Window
locations
Average of windowed
value
Fixed point
background
Parameters
(position,slope,[level])
9-10 August, 2002 Computational Audition 26
Structure of STRAIGHT
F0 estimation
STRAIGHT: structureSTRAIGHT: structure
9-10 August, 2002 Computational Audition 27
window selection for reliablewindow selection for reliable
representation of mappingrepresentation of mapping
Refinement of Fo synchronous windows
9-10 August, 2002 Computational Audition 28
Window with harmonicWindow with harmonic
cancellationcancellation
9-10 August, 2002 Computational Audition 29
Fixed-point-based sinusoidalFixed-point-based sinusoidal
components extractioncomponents extraction
9-10 August, 2002 Computational Audition 30
Fixed-point-based sinusoidalFixed-point-based sinusoidal
frequency and C/N estimationfrequency and C/N estimation
C/N information enablesC/N information enables
optimum F0 estimation based onoptimum F0 estimation based on
multiple harmonic componentsmultiple harmonic components
9-10 August, 2002 Computational Audition 31
Approximate estimation of C/NApproximate estimation of C/N
9-10 August, 2002 Computational Audition 32
Reliable built-in mechanismReliable built-in mechanism
for fundamental component selectionfor fundamental component selection
linearlinear filter
arrangement
log-linearlog-linear filter
arrangement
mapping filter output
9-10 August, 2002 Computational Audition 33
Fixed points on C/N mapFixed points on C/N map
Fundamental
component
9-10 August, 2002 Computational Audition 34
F0 evaluation based on EGGF0 evaluation based on EGG
gross error
W/O:0.72%
with:0.32%
female
9-10 August, 2002 Computational Audition 35
Graded sourceGraded source
InformationInformation
nn Fixed point basedFixed point based
FoFo extractionextraction
(with C/N map)(with C/N map)
F0 trajectoriesF0 trajectories
(resolution: 1/F0)(resolution: 1/F0)
C/N for each fixed point
Graded aperiodicity information
Is also extracted
9-10 August, 2002 Computational Audition 36
Fixed points in the time domainFixed points in the time domain
nn How to define auditory temporal eventsHow to define auditory temporal events
–– Localized energyLocalized energy centroidcentroid
Alternative representation
9-10 August, 2002 Computational Audition 37
Fixed points in the time domainFixed points in the time domain
Squared whitened signal Energy centroid
Gaussian
window
Amount of energy
concentration
Speech
waveform
9-10 August, 2002 Computational Audition 38
waveform
Energy
centrold
Window center
Fixed points
Fixed point based event detectionFixed point based event detection
9-10 August, 2002 Computational Audition 39
Mean time
duration
Definition of event in the time domainDefinition of event in the time domain
Event location
Windowed whitened signal
9-10 August, 2002 Computational Audition 40
Windowed event location andWindowed event location and
the original event locationthe original event location
Gaussian window
Approximation of envelope
Windowed location
Original
location
Window location
9-10 August, 2002 Computational Audition 41
Slope at fixed point
duration
Window
parameter
Duration can be estimated fromDuration can be estimated from
the geometrical parameter atthe geometrical parameter at
the fixed pointthe fixed point
9-10 August, 2002 Computational Audition 42
Equivalence between the time domainEquivalence between the time domain
definition and the frequency domaindefinition and the frequency domain
definitiondefinition
waveform
Time domain definition
Frequency domain definition
Group delay
9-10 August, 2002 Computational Audition 43
Inverse problem:Inverse problem:
Where is the excitation?Where is the excitation?
Minimum phase
response
Event as the
energy centroid
Excitation
(impulse)
compensation
9-10 August, 2002 Computational Audition 44
Equivalence in definitionsEquivalence in definitions
nn Frequency domain definition of the event locationFrequency domain definition of the event location
nn Assuming causalityAssuming causality
Group delay
9-10 August, 2002 Computational Audition 45
Group delay of a minimum phaseGroup delay of a minimum phase
responseresponse
を介した計算Cepstrum
9-10 August, 2002 Computational Audition 46
Compensation based onCompensation based on
minimum phase group delayminimum phase group delay
Observed group delay
Causal group delay
Compensated event location
Compensated event duration
9-10 August, 2002 Computational Audition 47
example
Observed group delay
Minimum phase
group delay
Compensated group delay
9-10 August, 2002 Computational Audition 48
Excitation estimation based on fixedExcitation estimation based on fixed
point based event detectionpoint based event detection
Event based concentration Excitation based concentration
Energy centroid
Compensated
group delay
excitation
Vocal fold closure
9-10 August, 2002 Computational Audition 49
Excitation extraction accuracyExcitation extraction accuracy
Standard
deviation
9-10 August, 2002 Computational Audition 50
Estimated
excitation
Speech
waveform
Multiple resolution display ofMultiple resolution display of
events (fixed points)events (fixed points)
9-10 August, 2002 Computational Audition 51
Multiple resolution display ofMultiple resolution display of
events (fixed points)events (fixed points)
demo
Fixed points
due to one
excitation
aligns on a
straight line
9-10 August, 2002 Computational Audition 52
Phase map of wavelet transformPhase map of wavelet transform
9-10 August, 2002 Computational Audition 53
Instantaneous frequency basedInstantaneous frequency based
fixed pointsfixed points
9-10 August, 2002 Computational Audition 54
Instantaneous frequency basedInstantaneous frequency based
fixed pointsfixed points
9-10 August, 2002 Computational Audition 55
Group delay based fixed pointsGroup delay based fixed points
9-10 August, 2002 Computational Audition 56
Group delay based fixed pointsGroup delay based fixed points
9-10 August, 2002 Computational Audition 57
Structure of STRAIGHT
Source attribute control
STRAIGHT: structureSTRAIGHT: structure
9-10 August, 2002 Computational Audition 58
Group delay manipulated
mixed-mode excitation source
Group delay manipulated
mixed-mode excitation source
group delay asymmetry
impulse response
..provides continuous coverage
from pulse train to random noise
9-10 August, 2002 Computational Audition 59
SummarySummary
nn Functional (computational after Marr)Functional (computational after Marr)
approach is important and productive.approach is important and productive.
nn Fixed points provide feature values asFixed points provide feature values as
well as their reliability indices.well as their reliability indices.
–– UsingUsing within channelwithin channel informationinformation
nn Fixed point concept may provide clue toFixed point concept may provide clue to
integrate Fourier based concept andintegrate Fourier based concept and
wavelet-wavelet-MellinMellin transform based concept.transform based concept.
9-10 August, 2002 Computational Audition 60
ColleaguesColleagues
nn Haruhiro KatayoseHaruhiro Katayose, Toshio, Toshio IrinoIrino,, TakanobuTakanobu NishiuraNishiura
(Wakayama(Wakayama UnivUniv.).)
nn MinoruMinoru TsuzakiTsuzaki, Hideki, Hideki IwasawaIwasawa (ATR)(ATR)
nn ParhamParham ZolfaghariZolfaghari (NTT)(NTT)
nn Kiyohiro ShikanoKiyohiro Shikano, Hiroshi, Hiroshi SaruwatariSaruwatari (NAIST)(NAIST)
nn Fumitada ItakuraFumitada Itakura, Kazuya Takeda, Shoji, Kazuya Takeda, Shoji kajitakajita, Hideki, Hideki
BannoBanno (CIAIR, Nagoya(CIAIR, Nagoya UnivUniv.).)
nn MasatoMasato AkagiAkagi, Masashi, Masashi UnokiUnoki (JAIST)(JAIST)
nn Seiichi Nakagawa (Seiichi Nakagawa (ToyohashiToyohashi Inst. Tech)Inst. Tech)
nn ShigekiShigeki SagayamaSagayama, Nobuaki, Nobuaki MinematsuMinematsu ((UnivUniv. Tokyo). Tokyo)
nn DianeDiane KewleyKewley-Port (Indiana-Port (Indiana UnivUniv. USA). USA)
nn Osamu Fujimura (Ohio stateOsamu Fujimura (Ohio state UnivUniv. USA). USA)
nn Alain deAlain de CheveignCheveignéé (IRCAM, France)(IRCAM, France)
nn Roy D. Patterson (CNBH, UK)Roy D. Patterson (CNBH, UK)
9-10 August, 2002 Computational Audition 61
ReferencesReferences
nn Hideki Kawahara,Hideki Kawahara, IkuyoIkuyo Masuda-Masuda-KatsuseKatsuse and Alain deand Alain de CheveigneCheveigne: Restructuring: Restructuring
speech representations using a pitch-adaptive time-frequency smoothing and anspeech representations using a pitch-adaptive time-frequency smoothing and an
instantaneous-frequency-based F0 extraction: Possible role of ainstantaneous-frequency-based F0 extraction: Possible role of a reptitivereptitive
structure in sounds, Speech Communication, 27, pp.187-207 (1999).structure in sounds, Speech Communication, 27, pp.187-207 (1999).
nn Hideki Kawahara,Hideki Kawahara, Haruhiro KatayoseHaruhiro Katayose, Alain de, Alain de CheveigneCheveigne, Roy D. Patterson:, Roy D. Patterson:
Fixed Point Analysis of Frequency to Instantaneous Frequency Mapping forFixed Point Analysis of Frequency to Instantaneous Frequency Mapping for
Accurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6,Accurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6,
Page 2781-2784 (1999).Page 2781-2784 (1999).
nn Hideki Kawahara, YoshinoriHideki Kawahara, Yoshinori AtakeAtake and Parhamand Parham ZolfaghariZolfaghari: Accurate vocal event: Accurate vocal event
detection method based on a fixed-point to weighted average group delay,detection method based on a fixed-point to weighted average group delay,
ICSLP-2000, Beijing, pp.664-667 2000.ICSLP-2000, Beijing, pp.664-667 2000.
nn H. Kawahara and PH. Kawahara and P ZolfaghariZolfaghari: Systematic F0 glitches around vowel nasal: Systematic F0 glitches around vowel nasal
transitions, EUROSPEECH'2001, pp.2459-2462, 2001.transitions, EUROSPEECH'2001, pp.2459-2462, 2001.
nn H. Kawahara, JoH. Kawahara, Jo EstillEstill and O. Fujimura:and O. Fujimura: AperiodicityAperiodicity extraction and control usingextraction and control using
mixed mode excitation and group delay manipulation for a high quality speechmixed mode excitation and group delay manipulation for a high quality speech
analysis, modification and synthesis system STRAIGHT, MAVEBA 2001,analysis, modification and synthesis system STRAIGHT, MAVEBA 2001,
Sept.13-15,Sept.13-15, FirentzeFirentze Italy, 2001.Italy, 2001.
nn H. Kawahara and H.H. Kawahara and H. KatayoseKatayose: Scat generation research program based on: Scat generation research program based on
STRAIGHT, a high-quality speech analysis, modification and synthesis system,STRAIGHT, a high-quality speech analysis, modification and synthesis system,
J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)
9-10 August, 2002 Computational Audition 62
For computationalFor computational ““AuditionAudition””
seed#1 seed#2
F0 trajectory andF0 trajectory and
frequency axis modificationfrequency axis modification
Parts preparationParts preparation
Mixing and level adjustmentMixing and level adjustment
9-10 August, 2002 Computational Audition 63
Nonlinear time warping basedNonlinear time warping based
on phase of the F0 componenton phase of the F0 component
(FM pulse train)(FM pulse train)
without time warpingwithout time warping with time warpingwith time warping
9-10 August, 2002 Computational Audition 64
Nonlinear time warping basedNonlinear time warping based
on phase of the F0 componenton phase of the F0 component
(vowel sequence /(vowel sequence /aiueoaiueo/)/)
without time warpingwithout time warping with time warpingwith time warping

More Related Content

Similar to Straight

Geometric Approach to Spectral Substraction
Geometric Approach to Spectral SubstractionGeometric Approach to Spectral Substraction
Geometric Approach to Spectral Substractionkeerthi thallam
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentationguest6e7a1b1
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
 
Wavemeter for VisibleNIRDiGangi2006.pdf
Wavemeter for VisibleNIRDiGangi2006.pdfWavemeter for VisibleNIRDiGangi2006.pdf
Wavemeter for VisibleNIRDiGangi2006.pdfennio12
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMLconf
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONIRJET Journal
 
Analog for all_preview
Analog for all_previewAnalog for all_preview
Analog for all_previewSahyogeeTech
 
A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...André Moreira
 
Thesis presentation Slides Ph.D. Aliouat Ahcen
Thesis presentation Slides Ph.D. Aliouat AhcenThesis presentation Slides Ph.D. Aliouat Ahcen
Thesis presentation Slides Ph.D. Aliouat AhcenAhcen ALIOUAT
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...IRJET Journal
 
TAROT2013 Testing School - Gilles Perrouin presentation
TAROT2013 Testing School -  Gilles Perrouin presentationTAROT2013 Testing School -  Gilles Perrouin presentation
TAROT2013 Testing School - Gilles Perrouin presentationHenry Muccini
 
Freespaceoptics
FreespaceopticsFreespaceoptics
FreespaceopticsKIIMS
 
SoundField UPM-1 Review
SoundField UPM-1 ReviewSoundField UPM-1 Review
SoundField UPM-1 ReviewRadikal Ltd.
 

Similar to Straight (20)

Dfma
DfmaDfma
Dfma
 
Geometric Approach to Spectral Substraction
Geometric Approach to Spectral SubstractionGeometric Approach to Spectral Substraction
Geometric Approach to Spectral Substraction
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentation
 
Ch17
Ch17Ch17
Ch17
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
Wavemeter for VisibleNIRDiGangi2006.pdf
Wavemeter for VisibleNIRDiGangi2006.pdfWavemeter for VisibleNIRDiGangi2006.pdf
Wavemeter for VisibleNIRDiGangi2006.pdf
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
Analog for all_preview
Analog for all_previewAnalog for all_preview
Analog for all_preview
 
A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...
 
Thesis presentation Slides Ph.D. Aliouat Ahcen
Thesis presentation Slides Ph.D. Aliouat AhcenThesis presentation Slides Ph.D. Aliouat Ahcen
Thesis presentation Slides Ph.D. Aliouat Ahcen
 
D tosi cleo_etch
D tosi cleo_etchD tosi cleo_etch
D tosi cleo_etch
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...Effective Audio Storage and Retrieval in Infrastructure less Environment over...
Effective Audio Storage and Retrieval in Infrastructure less Environment over...
 
TAROT2013 Testing School - Gilles Perrouin presentation
TAROT2013 Testing School -  Gilles Perrouin presentationTAROT2013 Testing School -  Gilles Perrouin presentation
TAROT2013 Testing School - Gilles Perrouin presentation
 
Freespaceoptics
FreespaceopticsFreespaceoptics
Freespaceoptics
 
Reduction of audio acoustic in Audio-visual transceiving with single port
Reduction of audio acoustic in Audio-visual transceiving with single portReduction of audio acoustic in Audio-visual transceiving with single port
Reduction of audio acoustic in Audio-visual transceiving with single port
 
A novel approach to record sound
A novel approach to record soundA novel approach to record sound
A novel approach to record sound
 
SoundField UPM-1 Review
SoundField UPM-1 ReviewSoundField UPM-1 Review
SoundField UPM-1 Review
 

More from melvincabatuan

More from melvincabatuan (19)

2 Info Theory
2 Info Theory2 Info Theory
2 Info Theory
 
5 Info Theory
5 Info Theory5 Info Theory
5 Info Theory
 
3 Info Theory
3 Info Theory3 Info Theory
3 Info Theory
 
6 Info Theory
6 Info Theory6 Info Theory
6 Info Theory
 
1 Info Theory
1 Info Theory1 Info Theory
1 Info Theory
 
4 Info Theory
4 Info Theory4 Info Theory
4 Info Theory
 
Dyspan Sdr Cr Tutorial 10 25 Rev02
Dyspan Sdr Cr Tutorial 10 25 Rev02Dyspan Sdr Cr Tutorial 10 25 Rev02
Dyspan Sdr Cr Tutorial 10 25 Rev02
 
Linear Algebra
Linear AlgebraLinear Algebra
Linear Algebra
 
(Ofdm)
(Ofdm)(Ofdm)
(Ofdm)
 
Cover
CoverCover
Cover
 
Straight
StraightStraight
Straight
 
1 1040 Henry Nsma May 2008 V3
1 1040 Henry Nsma May 2008 V31 1040 Henry Nsma May 2008 V3
1 1040 Henry Nsma May 2008 V3
 
Meixia Tao Introduction To Wireless Communications And Recent Advances
Meixia Tao Introduction To Wireless Communications And Recent AdvancesMeixia Tao Introduction To Wireless Communications And Recent Advances
Meixia Tao Introduction To Wireless Communications And Recent Advances
 
Cs702 Anm A Ds M
Cs702 Anm A Ds MCs702 Anm A Ds M
Cs702 Anm A Ds M
 
Cognitive Radio Standardisation In Europe Etsi
Cognitive Radio Standardisation In Europe EtsiCognitive Radio Standardisation In Europe Etsi
Cognitive Radio Standardisation In Europe Etsi
 
Course Development Template
Course Development TemplateCourse Development Template
Course Development Template
 
Air Interface Club Lra Fading Channels
Air Interface Club Lra Fading ChannelsAir Interface Club Lra Fading Channels
Air Interface Club Lra Fading Channels
 
1 1040 Henry Nsma May 2008 V3
1 1040 Henry Nsma May 2008 V31 1040 Henry Nsma May 2008 V3
1 1040 Henry Nsma May 2008 V3
 
Pajek
PajekPajek
Pajek
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

Straight

  • 1. 9-10 August, 2002 Computational Audition 1 Fixed Point Representations forFixed Point Representations for Very High-Quality Speech andVery High-Quality Speech and Sound Modification SystemSound Modification System Hideki KawaharaHideki Kawahara Wakayama University, JapanWakayama University, Japan
  • 2. 9-10 August, 2002 Computational Audition 2 SummarySummary nn Functional (computational after Marr)Functional (computational after Marr) approach is important and productive.approach is important and productive. nn Fixed points provide feature values asFixed points provide feature values as well as their reliability indices.well as their reliability indices. –– UsingUsing within channelwithin channel informationinformation nn Fixed point concept may provide clue toFixed point concept may provide clue to integrate Fourier based concept andintegrate Fourier based concept and wavelet-wavelet-MellinMellin transform based concept.transform based concept. Reference systemReference system ““LocalLocal”” center of gravitycenter of gravity
  • 3. 9-10 August, 2002 Computational Audition 3 original STRAIGHT: demoSTRAIGHT: demo
  • 4. 9-10 August, 2002 Computational Audition 4 STRAIGHT demo: morphingSTRAIGHT demo: morphing neutral angry interpolationextrapolation extrapolation Word: /hai/ (“Yes” in Japanese)
  • 5. 9-10 August, 2002 Computational Audition 5 BackgroundBackground nn ““Auditory BrainAuditory Brain”” Project by CRESTProject by CREST –– Short term goal: speech processing systemsShort term goal: speech processing systems based on functional models of auditory functions.based on functional models of auditory functions. –– STRAIGHT: a very high-quality speechSTRAIGHT: a very high-quality speech manipulation systemmanipulation system –– Fixed point based algorithmsFixed point based algorithms (alternative way of dimensional reduction of(alternative way of dimensional reduction of auditory representations)auditory representations) –– Long term goal:Long term goal: ““computationalcomputational”” theorirdtheorird ofof auditionaudition nn Frustrations in ways how auditory modelsFrustrations in ways how auditory models are used in ASR and how speech processingare used in ASR and how speech processing systems are evaluated.systems are evaluated.
  • 6. 9-10 August, 2002 Computational Audition 6 ““Auditory BrainAuditory Brain”” ProjectProject nn To develop a very high quality speechTo develop a very high quality speech and/or sound manipulation system basedand/or sound manipulation system based on perceptually relevant parameters andon perceptually relevant parameters and it does not preserve phase/waveformit does not preserve phase/waveform information.information. –– ?? Are distance based quality measures relevant???? Are distance based quality measures relevant?? –– ?? Why does periodic sound sounds smoother and?? Why does periodic sound sounds smoother and richer (in Auditory Fovea)??richer (in Auditory Fovea)?? –– ?? Is it relevant to test highly nonlinear speech?? Is it relevant to test highly nonlinear speech perception using elementary sounds ??perception using elementary sounds ??
  • 7. 9-10 August, 2002 Computational Audition 7 Why high quality?Why high quality? nn Ecological approach for investigatingEcological approach for investigating highly nonlinear system, Humanhighly nonlinear system, Human Not necessarily be predictable from elementary test signals Necessary to use ecologically valid stimuli Naturalness
  • 8. 9-10 August, 2002 Computational Audition 8 Hans Moravec: Robot, 2000, Oxford xbox iMac PlayStation-3 Key issue: compatibilityKey issue: compatibility Background figure is removed. Please visit Hans Moravec’s page for the original figure. Faster than exponential growth in computing power (Chapter 3: Power and Presence, Page 60) http://www.frc.ri.cmu.edu/~hpm/book98/
  • 9. 9-10 August, 2002 Computational Audition 9 ““Auditory BrainAuditory Brain”” ProjectProject nn Computational theories of speech/auditoryComputational theories of speech/auditory perceptionperception –– ecological constraints on evolutionecological constraints on evolution –– It cannot be ad hoc.It cannot be ad hoc. »» When there is an elegant and reasonable algorithmWhen there is an elegant and reasonable algorithm and it does not violate ecological (biological andand it does not violate ecological (biological and environmental) constraints, there is no reason toenvironmental) constraints, there is no reason to deny that the algorithm shares the commondeny that the algorithm shares the common underlying principles with our auditory system.underlying principles with our auditory system.
  • 10. 9-10 August, 2002 Computational Audition 10 ““Auditory BrainAuditory Brain”” ProjectProject nn Computational theories of speech/auditoryComputational theories of speech/auditory perceptionperception –– Periodicity: time-frequency sampling gridPeriodicity: time-frequency sampling grid –– Periodicity: stable reference point for wavelet-Periodicity: stable reference point for wavelet- MellinMellin transformtransform –– Log-linear frequency axisLog-linear frequency axis »» Wavelet-Wavelet-MellinMellin transform: shape and sizetransform: shape and size –– Why two ears? ICAWhy two ears? ICA –– Long term correlation (structure)Long term correlation (structure) »» ASR, musicASR, music
  • 11. 9-10 August, 2002 Computational Audition 11 STRAIGHT a core technologySTRAIGHT a core technology nn Conceptually simple architectureConceptually simple architecture –– Channel VOCODERChannel VOCODER –– Source filter modelSource filter model nn Graded parameters (Graded parameters (vsvs binary decision)binary decision) –– Sensitivity analysisSensitivity analysis –– MorphingMorphing nn Reliability / TransparencyReliability / Transparency –– No post-processingNo post-processing –– Weakly constrained modelWeakly constrained model
  • 12. 9-10 August, 2002 Computational Audition 12 Structure of STRAIGHT STRAIGHT: architectureSTRAIGHT: architecture
  • 13. 9-10 August, 2002 Computational Audition 13 STRAIGHT a core technologySTRAIGHT a core technology nn Conceptually simple architectureConceptually simple architecture –– Channel VOCODERChannel VOCODER –– Source filter modelSource filter model nn Graded parameters (Graded parameters (vsvs binary decision)binary decision) –– Sensitivity analysisSensitivity analysis –– MorphingMorphing nn Reliability / TransparencyReliability / Transparency –– No post-processingNo post-processing –– Weakly constrained modelWeakly constrained model
  • 14. 9-10 August, 2002 Computational Audition 14 Structure of STRAIGHT Spectral envelope estimation STRAIGHT: structureSTRAIGHT: structure
  • 15. 9-10 August, 2002 Computational Audition 15 Weakly constrained spectralWeakly constrained spectral envelope estimationenvelope estimation waveform Time window Interferences in the time domain Interferences in the frequency domain
  • 16. 9-10 August, 2002 Computational Audition 16 Weakly constrained spectralWeakly constrained spectral envelope estimationenvelope estimation Reduction of edge discontinuity Reduction of periodicity interference smoothing by spline basisComposite window
  • 17. 9-10 August, 2002 Computational Audition 17 Time-frequency smoothing ( Time-frequency smoothing (current implementation)current implementation) F0 synchronous Gaussian window complimentary time window reduced interference spectrum F0 synchronous Gaussian window complimentary time window
  • 18. 9-10 August, 2002 Computational Audition 18 Compensation of over-Compensation of over- smoothingsmoothing
  • 19. 9-10 August, 2002 Computational Audition 19 Compensation of over-smoothingCompensation of over-smoothing
  • 20. 9-10 August, 2002 Computational Audition 20 Weakly constrained spectralWeakly constrained spectral envelope estimationenvelope estimation
  • 21. 9-10 August, 2002 Computational Audition 21 Weakly constrained spectralWeakly constrained spectral envelope estimationenvelope estimation
  • 22. 9-10 August, 2002 Computational Audition 22 Fixed point based algorithmsFixed point based algorithms nn Fixed points in the frequency domain: → Fixed points in the frequency domain: → F0 extractionF0 extraction nn Fixed points in the time domain: → Fixed points in the time domain: → Excitation extractionExcitation extraction
  • 23. 9-10 August, 2002 Computational Audition 23 Fixed point of mappingFixed point of mapping fixed point y x y=f(x) * Instantaneous frequency of a filter output around a sinusoidal component * Energy centroid of a windowed signal around an event Examples
  • 24. 9-10 August, 2002 Computational Audition 24 Averaging and fixed pointAveraging and fixed point nn Prominent componentProminent component Window locations Average of windowed value Fixed point background
  • 25. 9-10 August, 2002 Computational Audition 25 Averaging and fixed pointAveraging and fixed point nn Prominent componentProminent component Window locations Average of windowed value Fixed point background Parameters (position,slope,[level])
  • 26. 9-10 August, 2002 Computational Audition 26 Structure of STRAIGHT F0 estimation STRAIGHT: structureSTRAIGHT: structure
  • 27. 9-10 August, 2002 Computational Audition 27 window selection for reliablewindow selection for reliable representation of mappingrepresentation of mapping Refinement of Fo synchronous windows
  • 28. 9-10 August, 2002 Computational Audition 28 Window with harmonicWindow with harmonic cancellationcancellation
  • 29. 9-10 August, 2002 Computational Audition 29 Fixed-point-based sinusoidalFixed-point-based sinusoidal components extractioncomponents extraction
  • 30. 9-10 August, 2002 Computational Audition 30 Fixed-point-based sinusoidalFixed-point-based sinusoidal frequency and C/N estimationfrequency and C/N estimation C/N information enablesC/N information enables optimum F0 estimation based onoptimum F0 estimation based on multiple harmonic componentsmultiple harmonic components
  • 31. 9-10 August, 2002 Computational Audition 31 Approximate estimation of C/NApproximate estimation of C/N
  • 32. 9-10 August, 2002 Computational Audition 32 Reliable built-in mechanismReliable built-in mechanism for fundamental component selectionfor fundamental component selection linearlinear filter arrangement log-linearlog-linear filter arrangement mapping filter output
  • 33. 9-10 August, 2002 Computational Audition 33 Fixed points on C/N mapFixed points on C/N map Fundamental component
  • 34. 9-10 August, 2002 Computational Audition 34 F0 evaluation based on EGGF0 evaluation based on EGG gross error W/O:0.72% with:0.32% female
  • 35. 9-10 August, 2002 Computational Audition 35 Graded sourceGraded source InformationInformation nn Fixed point basedFixed point based FoFo extractionextraction (with C/N map)(with C/N map) F0 trajectoriesF0 trajectories (resolution: 1/F0)(resolution: 1/F0) C/N for each fixed point Graded aperiodicity information Is also extracted
  • 36. 9-10 August, 2002 Computational Audition 36 Fixed points in the time domainFixed points in the time domain nn How to define auditory temporal eventsHow to define auditory temporal events –– Localized energyLocalized energy centroidcentroid Alternative representation
  • 37. 9-10 August, 2002 Computational Audition 37 Fixed points in the time domainFixed points in the time domain Squared whitened signal Energy centroid Gaussian window Amount of energy concentration Speech waveform
  • 38. 9-10 August, 2002 Computational Audition 38 waveform Energy centrold Window center Fixed points Fixed point based event detectionFixed point based event detection
  • 39. 9-10 August, 2002 Computational Audition 39 Mean time duration Definition of event in the time domainDefinition of event in the time domain Event location Windowed whitened signal
  • 40. 9-10 August, 2002 Computational Audition 40 Windowed event location andWindowed event location and the original event locationthe original event location Gaussian window Approximation of envelope Windowed location Original location Window location
  • 41. 9-10 August, 2002 Computational Audition 41 Slope at fixed point duration Window parameter Duration can be estimated fromDuration can be estimated from the geometrical parameter atthe geometrical parameter at the fixed pointthe fixed point
  • 42. 9-10 August, 2002 Computational Audition 42 Equivalence between the time domainEquivalence between the time domain definition and the frequency domaindefinition and the frequency domain definitiondefinition waveform Time domain definition Frequency domain definition Group delay
  • 43. 9-10 August, 2002 Computational Audition 43 Inverse problem:Inverse problem: Where is the excitation?Where is the excitation? Minimum phase response Event as the energy centroid Excitation (impulse) compensation
  • 44. 9-10 August, 2002 Computational Audition 44 Equivalence in definitionsEquivalence in definitions nn Frequency domain definition of the event locationFrequency domain definition of the event location nn Assuming causalityAssuming causality Group delay
  • 45. 9-10 August, 2002 Computational Audition 45 Group delay of a minimum phaseGroup delay of a minimum phase responseresponse を介した計算Cepstrum
  • 46. 9-10 August, 2002 Computational Audition 46 Compensation based onCompensation based on minimum phase group delayminimum phase group delay Observed group delay Causal group delay Compensated event location Compensated event duration
  • 47. 9-10 August, 2002 Computational Audition 47 example Observed group delay Minimum phase group delay Compensated group delay
  • 48. 9-10 August, 2002 Computational Audition 48 Excitation estimation based on fixedExcitation estimation based on fixed point based event detectionpoint based event detection Event based concentration Excitation based concentration Energy centroid Compensated group delay excitation Vocal fold closure
  • 49. 9-10 August, 2002 Computational Audition 49 Excitation extraction accuracyExcitation extraction accuracy Standard deviation
  • 50. 9-10 August, 2002 Computational Audition 50 Estimated excitation Speech waveform Multiple resolution display ofMultiple resolution display of events (fixed points)events (fixed points)
  • 51. 9-10 August, 2002 Computational Audition 51 Multiple resolution display ofMultiple resolution display of events (fixed points)events (fixed points) demo Fixed points due to one excitation aligns on a straight line
  • 52. 9-10 August, 2002 Computational Audition 52 Phase map of wavelet transformPhase map of wavelet transform
  • 53. 9-10 August, 2002 Computational Audition 53 Instantaneous frequency basedInstantaneous frequency based fixed pointsfixed points
  • 54. 9-10 August, 2002 Computational Audition 54 Instantaneous frequency basedInstantaneous frequency based fixed pointsfixed points
  • 55. 9-10 August, 2002 Computational Audition 55 Group delay based fixed pointsGroup delay based fixed points
  • 56. 9-10 August, 2002 Computational Audition 56 Group delay based fixed pointsGroup delay based fixed points
  • 57. 9-10 August, 2002 Computational Audition 57 Structure of STRAIGHT Source attribute control STRAIGHT: structureSTRAIGHT: structure
  • 58. 9-10 August, 2002 Computational Audition 58 Group delay manipulated mixed-mode excitation source Group delay manipulated mixed-mode excitation source group delay asymmetry impulse response ..provides continuous coverage from pulse train to random noise
  • 59. 9-10 August, 2002 Computational Audition 59 SummarySummary nn Functional (computational after Marr)Functional (computational after Marr) approach is important and productive.approach is important and productive. nn Fixed points provide feature values asFixed points provide feature values as well as their reliability indices.well as their reliability indices. –– UsingUsing within channelwithin channel informationinformation nn Fixed point concept may provide clue toFixed point concept may provide clue to integrate Fourier based concept andintegrate Fourier based concept and wavelet-wavelet-MellinMellin transform based concept.transform based concept.
  • 60. 9-10 August, 2002 Computational Audition 60 ColleaguesColleagues nn Haruhiro KatayoseHaruhiro Katayose, Toshio, Toshio IrinoIrino,, TakanobuTakanobu NishiuraNishiura (Wakayama(Wakayama UnivUniv.).) nn MinoruMinoru TsuzakiTsuzaki, Hideki, Hideki IwasawaIwasawa (ATR)(ATR) nn ParhamParham ZolfaghariZolfaghari (NTT)(NTT) nn Kiyohiro ShikanoKiyohiro Shikano, Hiroshi, Hiroshi SaruwatariSaruwatari (NAIST)(NAIST) nn Fumitada ItakuraFumitada Itakura, Kazuya Takeda, Shoji, Kazuya Takeda, Shoji kajitakajita, Hideki, Hideki BannoBanno (CIAIR, Nagoya(CIAIR, Nagoya UnivUniv.).) nn MasatoMasato AkagiAkagi, Masashi, Masashi UnokiUnoki (JAIST)(JAIST) nn Seiichi Nakagawa (Seiichi Nakagawa (ToyohashiToyohashi Inst. Tech)Inst. Tech) nn ShigekiShigeki SagayamaSagayama, Nobuaki, Nobuaki MinematsuMinematsu ((UnivUniv. Tokyo). Tokyo) nn DianeDiane KewleyKewley-Port (Indiana-Port (Indiana UnivUniv. USA). USA) nn Osamu Fujimura (Ohio stateOsamu Fujimura (Ohio state UnivUniv. USA). USA) nn Alain deAlain de CheveignCheveignéé (IRCAM, France)(IRCAM, France) nn Roy D. Patterson (CNBH, UK)Roy D. Patterson (CNBH, UK)
  • 61. 9-10 August, 2002 Computational Audition 61 ReferencesReferences nn Hideki Kawahara,Hideki Kawahara, IkuyoIkuyo Masuda-Masuda-KatsuseKatsuse and Alain deand Alain de CheveigneCheveigne: Restructuring: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and anspeech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of ainstantaneous-frequency-based F0 extraction: Possible role of a reptitivereptitive structure in sounds, Speech Communication, 27, pp.187-207 (1999).structure in sounds, Speech Communication, 27, pp.187-207 (1999). nn Hideki Kawahara,Hideki Kawahara, Haruhiro KatayoseHaruhiro Katayose, Alain de, Alain de CheveigneCheveigne, Roy D. Patterson:, Roy D. Patterson: Fixed Point Analysis of Frequency to Instantaneous Frequency Mapping forFixed Point Analysis of Frequency to Instantaneous Frequency Mapping for Accurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6,Accurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6, Page 2781-2784 (1999).Page 2781-2784 (1999). nn Hideki Kawahara, YoshinoriHideki Kawahara, Yoshinori AtakeAtake and Parhamand Parham ZolfaghariZolfaghari: Accurate vocal event: Accurate vocal event detection method based on a fixed-point to weighted average group delay,detection method based on a fixed-point to weighted average group delay, ICSLP-2000, Beijing, pp.664-667 2000.ICSLP-2000, Beijing, pp.664-667 2000. nn H. Kawahara and PH. Kawahara and P ZolfaghariZolfaghari: Systematic F0 glitches around vowel nasal: Systematic F0 glitches around vowel nasal transitions, EUROSPEECH'2001, pp.2459-2462, 2001.transitions, EUROSPEECH'2001, pp.2459-2462, 2001. nn H. Kawahara, JoH. Kawahara, Jo EstillEstill and O. Fujimura:and O. Fujimura: AperiodicityAperiodicity extraction and control usingextraction and control using mixed mode excitation and group delay manipulation for a high quality speechmixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT, MAVEBA 2001,analysis, modification and synthesis system STRAIGHT, MAVEBA 2001, Sept.13-15,Sept.13-15, FirentzeFirentze Italy, 2001.Italy, 2001. nn H. Kawahara and H.H. Kawahara and H. KatayoseKatayose: Scat generation research program based on: Scat generation research program based on STRAIGHT, a high-quality speech analysis, modification and synthesis system,STRAIGHT, a high-quality speech analysis, modification and synthesis system, J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)
  • 62. 9-10 August, 2002 Computational Audition 62 For computationalFor computational ““AuditionAudition”” seed#1 seed#2 F0 trajectory andF0 trajectory and frequency axis modificationfrequency axis modification Parts preparationParts preparation Mixing and level adjustmentMixing and level adjustment
  • 63. 9-10 August, 2002 Computational Audition 63 Nonlinear time warping basedNonlinear time warping based on phase of the F0 componenton phase of the F0 component (FM pulse train)(FM pulse train) without time warpingwithout time warping with time warpingwith time warping
  • 64. 9-10 August, 2002 Computational Audition 64 Nonlinear time warping basedNonlinear time warping based on phase of the F0 componenton phase of the F0 component (vowel sequence /(vowel sequence /aiueoaiueo/)/) without time warpingwithout time warping with time warpingwith time warping