Blind source separation based on independent low-rank matrix analysis and its extensions

Blind source separation based on
independent low-rank matrix
analysis and its extensions
Ohio State University Visiting
December 15th, 2017
The University of Tokyo, Japan
Project Research Associate
Daichi Kitamura

• Name: Daichi Kitamura
• Age: 27 (born in 1990)
– Born in Kagawa in Japan
• Background:
– NAIST, Japan
• Master degree (received in 2014)
– SOKENDAI, Japan
• Ph.D. degree (received in 2017)
– The University of Tokyo, Japan
• Project Research Associate
• Research topics
– Acoustic signal processing, statistical signal processing,
audio source separation, etc.
Self introduction
2
Japan
Kagawa
(place of birth)
Tokyo
(Univ. Tokyo)

Contents
• Background
– Blind source separation (BSS) for audio signals
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Itakura–Saito nonnegative matrix factorization (ISNMF)
• Independent Low-Rank Matrix Analysis (ILRMA)
– Employ low-rank TF structures of each source in BSS
– Gaussian source model with TF-varying variance
– Relationship between ILRMA and multichannel NMF
– Theoretical extension of ILRMA for better optimization
• Conclusion
3

Contents
• Background
– Motivation
• Preliminaries
• Conclusion
4

• Blind source separation (BSS) for audio signals
– separates original audio sources
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
– can be available for many audio app.
• Consider only “determined” situation
Background
5
Recording mixture Separated guitar
BSS
Sources Observed Estimated
Mixing system Demixing system
# of mics
# of sources

• Basic theories and their evolution
History of BSS for audio signals
6
1994
1998
2013
1999
2012
Year
Many permutation
solvers for FDICA
Apply NMF to many tasks
Generative models in NMF
Many extensions of NMF
Independent component analysis (ICA)
Frequency-domain ICA (FDICA)
Itakura–Saito NMF (ISNMF)
Independent vector analysis (IVA)
Multichannel NMF
Independent low-rank matrix analysis (ILRMA)
*Depicting only popular methods
2016
2009
2006
2011 Auxiliary-function-based IVA (AuxIVA)
Time-varying Gaussian IVA
Nonnegative matrix factorization (NMF)

Motivation of ILRMA
• Conventional BSS techniques based on ICA
–  Minimum distortion (linear demixing)
–  Relatively fast and stable optimization
• FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary
function technique [N. Ono+, 2010], [N. Ono, 2011]
–  Could not use “specific” assumption of sources
• Only assumes non-Gaussian p.d.f. for sources
–  Permutation problem is crucial and still difficult to solve
• IVA often fails causing a “block permutation problem” [Y. Liang+, 2012]
• Better to use a “specific source model” in TF domain
– Independent low-rank matrix analysis (ILRMA) employs
a low-rank property 7
: frequency bins
Observed
signal
Source signalsFrequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix

Contents
• Background
– Motivation
• Preliminaries
• Conclusion
8

• Independent component analysis (ICA)[P. Comon, 1994]
– estimates without knowing
– Source model (scalar)
• is non-Gaussian and mutually independent
– Spatial model
• Mixing system is a time-invariant matrix
• Mixing system in audio signals
– Convolutive mixture with room reverberation
Related methods: ICA
9
Mixing
matrix
Demixing
matrix
Source model
Sources Observed Estimated
Spatial model

• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
– estimates frequency-wise demixing matrix
– Source model (scalar)
• is complex-valued,
non-Gaussian, and
mutually independent
– Spatial model
• Frequency-wise mixing
matrix is time-invariant
– Instantaneous mixture in each frequency band
– A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010]
• Permutation problem?
– Order of estimated signals cannot be determined by ICA
– Alignment of frequency-wise estimated signals is required
• Many permutation solvers were proposed
Related methods: FDICA
10
Spectrograms
ICA1
…
Frequencybin
Time frame
…
ICA2
ICA I

• FDICA requires signal alignment for all frequency
– Order of estimated signals cannot be determined by ICA*
Permutation problem
11
ICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Permutation
Solver
Estimated signal 1
Estimated signal 2
Time
*Signal scale also must be restored by applying a back-projection technique

Related methods: IVA
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
– extends ICA to multivariate probabilistic model to consider
sourcewise frequency vector as a vector variable
– Source model (vector)
• is multivariate, spherical, complex-valued, non-Gaussian, and
mutually independent
– Spatial model
• Mixing system is a time-invariant matrix (rank-1 spatial model) 12
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate non-
Gaussian dist.
Have higher-order
correlations
Permutation-free estimation of is achieved!
Source vector

• Spherical multivariate distribution[T. Kim+, 2007]
• Why spherical distribution?
– Frequency bands that have similar activations will be merged
together as one source avoid permutation problem
Higher-order correlation assumed in IVA
13
x1 and x2 are mutually independent
Spherical
Laplace dist.
Mutually
independent two
Laplace dist.s
x1 and x2 have higher-order correlation
Probability depends on
only the norm

• Frequency-domain ICA (FDICA) [P. Smaragdis, 1998]
• Independent vector analysis (IVA)[A. Hiroe, 2006], [T. Kim, 2006]
Comparison of source models
14
Observed
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Estimated
Demixing
matrix
Current
empirical dist.
Non-Gaussian
source dist.
STFT
Frequency
Time
Frequency
Time
Observed Estimated
Current
empirical dist.
STFT
Frequency
Time
Frequency
Time
Non-Gaussian
spherical
source dist.
Scalar r.v.s
Vector
(multivariate) r.v.s
Update separation filter so that the estimated
signals obey non-Gaussian distribution we assumed
Mixture is close to Gaussian
signal because of CLT
Source obeys non-
Gaussian dist.
Mutually
independent
Demixing
matrix Mutually
independent

Related method: NMF
• Nonnegative matrix factorization (NMF) [D. D. Lee, 1999]
– Low-rank decomposition with nonnegative constraint
• Limited number of nonnegative bases and their coefficients
– Spectrogram is decomposed in acoustic signal processing
• Frequently appearing spectral patterns and their activations
15
Amplitude Amplitude
Nonnegative matrix
(power spectrogram)
Basis matrix
(spectral patterns)
Activation matrix
(time-varying gains)
Time
: # of freq. bins
: # of time frames
: # of bases
Time
Frequency
Frequency

• ISNMF[C. Févotte, 2009]
– can be decomposed using “stable property” of
• If we define ,
Related method: ISNMF
16
Equivalent Circularly symmetric complex Gaussian dist.
Complex-valued observed signal
Nonnegative variance
Variance is also decomposed!

• Power spectrogram corresponds to variances in TF
plane
Related method: ISNMF
17
Frequencybin
Time frame
: Power spectrogram
Small value of power
Large value of power
Complex Gaussian distribution with TF-varying variance
If we marginalize in terms of time or frequency, the distribution
becomes non-Gaussian even though each TF grid is defined in
Gaussian distribution
Grayscale shows the
value of variance

Comparison of low-rankness
18
Drums Guitar
Vocals Speech

• Low-rankness (simplicity of a matrix)
– can be measured by a cumulative singular value (CSV)
– Drums and guitar are quite low-rank
• Also, vocals and speech are to some extent low-rank
– Music spectrogram can be modeled by only few patterns
Comparison of low-rankness
19
95% line
7 29 Around 90
Number of bases
when CSV reaches 95%
（Spectrogram size is 1025 x1883）

Contents
• Background
– Motivation
• Preliminaries
• Conclusion
20

Extension of source model in IVA
• Source model in IVA
– has a frequency-uniform scale
• Spherical multivariate Laplace
• Higher-order correlation among frequency
– Equivalent to NMF with one flat basis
• Source model in ISNMF[C. Févotte+, 2009]
– NMF with arbitrary number of bases
• can represent complicated TF structures
– can learn “co-occurrence” structure
in TF domain for each source
• Low-rank co-occurrence is captured as the variance
– The source-wise structure can be
estimated by ISNMF 21
Frequency
Time
Frequency
Time
Replace the source model
assumed in ICA or IVA

• Source model in IVA
• Source model in ISNMF[C. Févotte+, 2009]
22
Frequency-uniform scale
Extension of source model in IVA
Zero-mean complex
Gaussian in each TF bin
Low-rank decomposition
with NMF
Spherical Laplace dist.
(bivariate case)
Frequency vector
(I-dimension)
Time-frequency-varying variance
Time-frequency matrix
(IJ-dimensional)
Replace the source model
assumed in ICA or IVA

• Negative log-likelihood in ILRMA
Cost function in ILRMA and partitioning function
23
All the variables can easily be
optimized by an alternative update
Update rules in ICA
Update rules in ISNMF
Estimated signal:
Cost function in ICA
(estimates demixing matrix)
Cost function in ISNMF
(estimates low-rank source model)
Replaced from IVA model
to ISNMF model

Update rules of ILRMA
• ML-based iterative update rules
– Update rule for is based on iterative projection [N. Ono, 2011]
– Update rules for NMF variables is based on MM algorithm
– Pseudo code is available at
• http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 24
Spatial model
(demixing matrix)
Source model
(NMF source model)
where
and is a one-hot vector
that has 1 at th element

Optimization process in ILRMA
• Demixing matrix and source model are alternatively
updated
– The precise modeling of low-rank TF structures will
improve the estimation accuracy of demixing matrix
25
Estimating
demixing matrix
Mixture
Separated
Source model
Update
NMF
NMF
Estimating
NMF variables

Comparison of source models
26
FDICA source model
Non-Gaussian scalar variable
IVA source model
Non-Gaussian vector variable
with higher-order correlation
ILRMA source model
Non-Gaussian matrix variable
with low-rank time-frequency
structure
Rank of TF matrix
of mixture
Rank of TF matrix
of each source

• Multichannel NMF[A. Ozerov+, 2010], [H. Sawada+, 2013]
Multichannel extension of NMF
27
Spatial covariances in
each time-frequency slot
Observed
multichannel signal
Spatial covariances
of each source Basis matrix Activation matrix
Spatial model Source model
Partitioning function
Spectral patterns
Gains
Spatial property of each source Timber patterns of all sources
Multichannel
vector
Instantaneous spatial covariance

Relationship b/w ILRMA and multichannel NMF
• Difference b/w ILRMA and multichannel NMF?
– Source distribution: complex Gaussian distribution (same)
– ILRMA assumes
– Multichannel NMF assumes full-rank spatial covariance
• Assumption: rank-1 spatial model
– Spatial covariance of each source is rank-1 matrix
– Equivalent to simultaneous mixing assumption
28
Sourcewise steering vector
,

Relationship b/w ILRMA and multichannel NMF
• Multichannel NMF with rank-1 spatial model
30
Substitute into the cost function
Transform the variables as

Relationship b/w MNMF, IVA, and ILRMA
• From multichannel NMF side,
– Rank-1 spatial model is introduced, transform the problem
from the estimation of mixing system to that of demixing
matrix
• From IVA side,
– Increase the number of spectral bases in source model
31
Source model
Spatialmodel
FlexibleLimited
FlexibleLimited
IVA
Multichannel
NMF
ILRMA
NMF source
model
Rank-1 spatial
model

Experimental evaluation
• Conditions
32
Source signals
Music signals obtained from SiSEC
Convolve impulse response, two microphones and two sources
Window length 512 ms of Hamming window
Shift length 128 ms (1/4 shift)
Number of bases 30 per each source
Evaluation score Improvement ot signal-to-distortion ratio (SDR)
2 m
Source 1
5.66 cm
50 50
Source 2
Impulse response E2A
(reverberation time: 300 ms)

• Ultimate NZ tour (Guitar and Synthesizer, 14s)
Result example
33
Poor
Good 20
15
10
5
0
SDRimprovement[dB] Guitar
Synth.
IVA Multichannel
NMF
ILRMA

• Ultimate NZ tour (Guitar and Synthesizer, 14s)
12
10
8
6
4
2
0
-2
SDRimprovement[dB]
4003002001000
Iteration steps
IVA
MNMF
ILRMA
ILRMA
Results: bearlin-roads
34
without Z
with Z
11.5 s
15.1 s 60.7 s
7647.3 s
Poor
Good

• Thurston’s pairwise comparison
– Speech separation and music separation tasks
– 10 males and 4 females
Subjective evaluation
35
1.6
1.2
0.8
0.4
0.0
-0.4
-0.8
-1.2
Subjectivescore
IVA Multichannel NMF ILRMA
Speech signals
Music signals

Demonstration: music source separation
• Music source separation
36
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
Source
separation
Pay attention to listen
three parts in the mixture
Another demo is available at http://d-kitamura.net/en/index_en.html

Best optimization balance?
• “Alternating update” of spatial model (ICA) and
source model (NMF) is used in ILRMA
– Sometimes the optimization in ILRMA is trapped into a poor
solution (local minimum)
• There may be exists the best optimization balance
b/w ICA and NMF models to avoid local minima 37
ICA (demixing matrix) NMF (low-rank source model)
Identity and
Randomized
NMF update ICA update

Controlling optimization speed
• How to control the optimization speed ensuring the
convergence of algorithm?
– Parametric majorization-equalization (ME) algorithm
– Apply parametric ME to NMF optimization to find the best
balance between ICA and NMF
• Find the best balance of optimization speeds
between NMF and ICA
38
Identity and
Randomized
NMF update ICA update
Becomes controllable
by parametric ME

Majorization-based optimization algorithm
• NMF optimization is based on a majorizer-based
algorithm (a.k.a. auxiliary function technique)
– Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000]
39

– Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000]
40

– Majorization-equalization (ME) algorithm [C. Févotte+, 2011]
41

– Majorization-equalization (ME) algorithm [C. Févotte+, 2011]
42
Fast Slow

– Parametric ME algorithm [Y. Mitsui+, 2017]
43

Parametric-ME-based NMF optimization
• Comparison of NMF update rules
– Update rules of basis matrix
– Only the exponent is different
– Optimization speed of NMF model can be controlled by
44
MM algorithm
ME algorithm
Parametric
ME algorithm

Parametric-ME-based ILRMA
• ILRMA of 2000 trials with various random seeds
45
ultimate_nz_tour
FastSlow

• ILRMA of 2000 trials with various random seeds
46
another_dreamer-the_ones_we_love
FastSlow

• Slower NMF optimization (small value of ) tends to
provide better results in ILRMA
– But, why? We don’t know!
• Conjecture
– In the beginning of ILRMA, NMF model is “random”
• Not believable
– The demixing matrix can be updated without source
model to some extent (because even IVA works well)
• Statistical independence between sources is very powerful
47
Independence-
based separation
Initialization
Precise modeling
of source structure
Improved
separation
Updated Updated Updated
Slowly updated Slowly updated Updated

Contents
• Background
– Motivation
• Preliminaries
• Conclusion
48

Conclusion
• Independent low-rank matrix analysis (ILRMA)
– Permutation-free ICA-based blind source separation
– Assumption
• Statistical independence between sources
• Low-rank time-frequency structure of each source
– Equivalent to multichannel NMF
• when the mixing assumption is valid
• On going works!
– Relaxation of rank-1 spatial model
– Extension of source generative model
– Semi/full-supervised ILRMA, user-guided ILRMA
– and, collaboration of deep neural network…
• Independent deeply learned matrix analysis (IDLMA)
• Maybe submitted at next EUSIPCO…? 49

Conclusion
– will be published from Springer in March, 2018!
50
Audio Source Separation
(Signals and Communication
Technology) 1st ed. 2018 Edition
by Shoji Makino (Editor)
Daichi Kitamura, Nobutaka Ono,
Hiroshi Sawada, Hirokazu
Kameoka, and Hiroshi Saruwatari,
"Determined blind source
separation with independent low-
rank matrix analysis“
Search in Amazon.com!

Conclusion
– will be presented in ICASSP 2018 as a tutorial session!
• Title (tentative): Blind Audio Source Separation on
Tensor Representation
– Presenters: Hiroshi Sawada, Nobutaka Ono, Hirokazu
Kameoka, Daichi Kitamura
51
Thank you so much
for your attention!

Blind source separation based on independent low-rank matrix analysis and its extensions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Blind source separation based on independent low-rank matrix analysis and its extensions

Similar to Blind source separation based on independent low-rank matrix analysis and its extensions (20)

More from Daichi Kitamura

More from Daichi Kitamura (17)

Recently uploaded

Recently uploaded (20)

Blind source separation based on independent low-rank matrix analysis and its extensions

Editor's Notes