SlideShare uma empresa Scribd logo
1 de 124
Baixar para ler offline
Sparse and Low Rank Representations in Music
                     Signal Analysis

                         Constantine Kotropoulos, Yannis Panagakis

                            Artificial Intelligence & Information Analysis Laboratory
                                             Department of Informatics
                                       Aristotle University of Thessaloniki
                                           Thessaloniki 54124, GREECE




                     2nd Greek Signal Processing Jam, Thessaloniki, May 17th, 2012



Sparse and Low Rank Representations in Music Signal Analysis                           1/54
Outline

 1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions
Sparse and Low Rank Representations in Music Signal Analysis            2/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            3/54
Introduction
 Music genre classification
         Genre: the most popular description of music content despite the
         lack of a commonly agreed definition. To classify music recordings
         into distinguishable genres using information extracted from the
         audio signal.

 Musical structure analysis
         To derive the musical form, i.e., the structural description of a
         music piece at the time scale of segments, such as intro, verse,
         chorus, bridge, from the audio signal.

 Music tagging
         Tags: text-based labels encoding semantic information related to
         music (i.e., instrumentation, genres, emotions, etc.). Manual
         tagging (expensive, time consuming, applicable to popular music);
         Automatic tagging (fast, applied to new and unpopular music).
Sparse and Low Rank Representations in Music Signal Analysis                 4/54
Introduction
 Music genre classification
         Genre: the most popular description of music content despite the
         lack of a commonly agreed definition. To classify music recordings
         into distinguishable genres using information extracted from the
         audio signal.

 Musical structure analysis
         To derive the musical form, i.e., the structural description of a
         music piece at the time scale of segments, such as intro, verse,
         chorus, bridge, from the audio signal.

 Music tagging
         Tags: text-based labels encoding semantic information related to
         music (i.e., instrumentation, genres, emotions, etc.). Manual
         tagging (expensive, time consuming, applicable to popular music);
         Automatic tagging (fast, applied to new and unpopular music).
Sparse and Low Rank Representations in Music Signal Analysis                 4/54
Introduction
 Music genre classification
         Genre: the most popular description of music content despite the
         lack of a commonly agreed definition. To classify music recordings
         into distinguishable genres using information extracted from the
         audio signal.

 Musical structure analysis
         To derive the musical form, i.e., the structural description of a
         music piece at the time scale of segments, such as intro, verse,
         chorus, bridge, from the audio signal.

 Music tagging
         Tags: text-based labels encoding semantic information related to
         music (i.e., instrumentation, genres, emotions, etc.). Manual
         tagging (expensive, time consuming, applicable to popular music);
         Automatic tagging (fast, applied to new and unpopular music).
Sparse and Low Rank Representations in Music Signal Analysis                 4/54
Introduction


 Motivation
         The appealing properties of slow temporal and spectro-temporal
         modulations from the human perceptual point of viewa ;
         The strong theoretical foundations of sparse representationsbc
         and low-rank representationsd .
     a
       K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and
 Audio Processing, vol. 3, no. 5, pp. 382–396, 1995.
     b
                   `
       E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly
 incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.
     c
       D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April
 2006.
     d
       G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,
 IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint).




Sparse and Low Rank Representations in Music Signal Analysis                                                            5/54
Introduction


 Motivation
         The appealing properties of slow temporal and spectro-temporal
         modulations from the human perceptual point of viewa ;
         The strong theoretical foundations of sparse representationsbc
         and low-rank representationsd .
     a
       K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and
 Audio Processing, vol. 3, no. 5, pp. 382–396, 1995.
     b
                   `
       E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly
 incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.
     c
       D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April
 2006.
     d
       G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,
 IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint).




Sparse and Low Rank Representations in Music Signal Analysis                                                            5/54
Notations




 Span
 Let span(X) denote the linear space spanned by the columns of X.
 Then, Y ∈ span(X) denotes that all column vectors of Y belong to
 span(X).




Sparse and Low Rank Representations in Music Signal Analysis        6/54
Notations




 Vector norms
   x   0   is   0   quasi-norm counting the number of nonzero entries in x.

 If |.| denotes the absolute value operator, x                             1   =   i   |xi | and
   x                      2                                        norm of x, respectively.
       2   =          i xi    are the       1   and the        2




Sparse and Low Rank Representations in Music Signal Analysis                                       6/54
Notations

 Matrix norms
                                                                                                 1
                                                                                             q
                                                                                                 q
                                                                                             p
 mixed        p,q   matrix norm: X               p,q     =               i   j   |xij   |p           .

 For p = q = 0, the matrix 0 quasi-norm, X 0 , returns the number of
 nonzero entries in X. For p = q = 1, the matrix 1 norm is obtained
  X 1 = i j |xij |.

 Frobenius norm: X                        =                        2
                                                                 xij .
                                     F               i       j


          norm of X: X                                             2
   2/ 1                             2,1   =      j           i   xij .

 nuclear norm of X, X ∗ , is the sum of singular values of X.


Sparse and Low Rank Representations in Music Signal Analysis                                             6/54
Notations


 Support
 A vector x is said to be q-sparse if the size of the support of x (i.e., the
 set of indices associated to non-zero vector elements) is no larger than
 q.

 The support of a collection of vectors X = [x1 , x2 , . . . , xN ] is defined as
 the union over all the individual supports.

 A matrix X is called q joint sparse, if |supp(X)| ≤ q. That is, there are
 at most q rows in X that contain nonzero elements, because
  X 0,q = |supp(X)| for any q a .
      a
          M. Davies and Y. Eldar, “Rank awareness in joint sparse recovery,” arXiv:1004.4529v1, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                            6/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            7/54
Auditory spectro-temporal modulations

 Computational auditory model
         It is inspired by psychoacoustical and neurophysiological
         investigations in the early and central stages of the human
         auditory system.
                                                                                        Auditory Spectro-Temporal Modulations
                                                                                               (Cortical Representation)




                                      Auditory Spectrogram




                                                               Central auditory model
               Early auditory model




                                                                                            Auditory Temporal Modulations




Sparse and Low Rank Representations in Music Signal Analysis                                                                    8/54
Auditory spectro-temporal modulations

 Computational auditory model
         It is inspired by psychoacoustical and neurophysiological
         investigations in the early and central stages of the human
         auditory system.
                                                                                        Auditory Spectro-Temporal Modulations
                                                                                               (Cortical Representation)




                                      Auditory Spectrogram




                                                               Central auditory model
               Early auditory model




                                                                                            Auditory Temporal Modulations




Sparse and Low Rank Representations in Music Signal Analysis                                                                    8/54
Auditory spectro-temporal modulations

 Early auditory system
         Auditory Spectrogram: time-frequency distribution of energy along
         a tonotopic (logarithmic frequency) axis.

                                                               Auditory Spectrogram




                                      Early auditory model




Sparse and Low Rank Representations in Music Signal Analysis                          9/54
Auditory spectro-temporal modulations


 Central auditory system - Temporal modulations

                 Auditory Spectrogram                          Auditory Temporal Modulations


                                                      z)
                                                   (H
                                                  ω




                                                      z)
                                                   (H
                                                  ω




Sparse and Low Rank Representations in Music Signal Analysis                                   10/54
Auditory spectro-temporal modulations



 Auditory temporal modulations across 10 music genres

                         Blues          Classical         Country    Disco    Hiphop




                         Jazz             Metal                Pop   Reggae   Rock




Sparse and Low Rank Representations in Music Signal Analysis                           11/54
Auditory spectro-temporal modulations


 Central auditory system - Spectro-temporal modulations

               Auditory Spectrogram                            Auditory Spectro-Temporal Modulations




                                             Ω(c/o)      z)
                                                       (H
                                                      ω




Sparse and Low Rank Representations in Music Signal Analysis                                           12/54
Auditory spectro-temporal modulations




 Efficient implementation through constant Q transform (CQT)




Sparse and Low Rank Representations in Music Signal Analysis   13/54
Auditory spectro-temporal modulations

 Parameters and implementation (1)
         The audio signal is analyzed by employing 128 constant-Q filters
         covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per
         octave). The magnitude of the CQT is compressed by raising
         each element of the CQT matrix to the power of 0.1a .
         The 2D multiresolution wavelet analysis is implemented via a bank
         of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8}
         (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz).
         For each music recording, the extracted 4D cortical representation
         is time- averaged and the resulting rate-scale-frequency 3D
         cortical representation is thus obtained.
    a
      C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music
 Computing Conf., Barcelona, Spain, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                         14/54
Auditory spectro-temporal modulations

 Parameters and implementation (1)
         The audio signal is analyzed by employing 128 constant-Q filters
         covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per
         octave). The magnitude of the CQT is compressed by raising
         each element of the CQT matrix to the power of 0.1a .
         The 2D multiresolution wavelet analysis is implemented via a bank
         of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8}
         (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz).
         For each music recording, the extracted 4D cortical representation
         is time- averaged and the resulting rate-scale-frequency 3D
         cortical representation is thus obtained.
    a
      C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music
 Computing Conf., Barcelona, Spain, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                         14/54
Auditory spectro-temporal modulations

 Parameters and implementation (1)
         The audio signal is analyzed by employing 128 constant-Q filters
         covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per
         octave). The magnitude of the CQT is compressed by raising
         each element of the CQT matrix to the power of 0.1a .
         The 2D multiresolution wavelet analysis is implemented via a bank
         of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8}
         (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz).
         For each music recording, the extracted 4D cortical representation
         is time- averaged and the resulting rate-scale-frequency 3D
         cortical representation is thus obtained.
    a
      C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music
 Computing Conf., Barcelona, Spain, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                         14/54
Auditory spectro-temporal modulations



 Parameters and implementation (2)
         To sum up, each music recording is represented by a vector
         x ∈ R7680 by stacking the elements of the 3D cortical
              +
         representation into a vector.
         An ensemble of music recordings is represented by the data
         matrix X ∈ R7680×S , where S is the number of the available
                     +
         recordings.
         Each row of X is normalized to the range [0, 1] by subtracting from
         each entry the row minimum and then by dividing it with the
         difference between the row maximum and the row minimum.




Sparse and Low Rank Representations in Music Signal Analysis               15/54
Auditory spectro-temporal modulations



 Parameters and implementation (2)
         To sum up, each music recording is represented by a vector
         x ∈ R7680 by stacking the elements of the 3D cortical
              +
         representation into a vector.
         An ensemble of music recordings is represented by the data
         matrix X ∈ R7680×S , where S is the number of the available
                     +
         recordings.
         Each row of X is normalized to the range [0, 1] by subtracting from
         each entry the row minimum and then by dividing it with the
         difference between the row maximum and the row minimum.




Sparse and Low Rank Representations in Music Signal Analysis               15/54
Auditory spectro-temporal modulations



 Parameters and implementation (2)
         To sum up, each music recording is represented by a vector
         x ∈ R7680 by stacking the elements of the 3D cortical
              +
         representation into a vector.
         An ensemble of music recordings is represented by the data
         matrix X ∈ R7680×S , where S is the number of the available
                     +
         recordings.
         Each row of X is normalized to the range [0, 1] by subtracting from
         each entry the row minimum and then by dividing it with the
         difference between the row maximum and the row minimum.




Sparse and Low Rank Representations in Music Signal Analysis               15/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            16/54
Learning Problem


 Statement
         Let X ∈ Rd×S be the data matrix that contains S vector samples in
         its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S.
         Without loss of generality, the data matrix can be partitioned as
         X = [A | Y], where
                 A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples
                 that belong to K classes
                 Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector
                 samples in its columns.
         If certain assumptions hold, learn a block diagonal matrix
         Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ.




Sparse and Low Rank Representations in Music Signal Analysis                           17/54
Learning Problem


 Statement
         Let X ∈ Rd×S be the data matrix that contains S vector samples in
         its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S.
         Without loss of generality, the data matrix can be partitioned as
         X = [A | Y], where
                 A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples
                 that belong to K classes
                 Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector
                 samples in its columns.
         If certain assumptions hold, learn a block diagonal matrix
         Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ.




Sparse and Low Rank Representations in Music Signal Analysis                           17/54
Learning Problem


 Statement
         Let X ∈ Rd×S be the data matrix that contains S vector samples in
         its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S.
         Without loss of generality, the data matrix can be partitioned as
         X = [A | Y], where
                 A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples
                 that belong to K classes
                 Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector
                 samples in its columns.
         If certain assumptions hold, learn a block diagonal matrix
         Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ.




Sparse and Low Rank Representations in Music Signal Analysis                           17/54
Learning Problem


 Statement
         Let X ∈ Rd×S be the data matrix that contains S vector samples in
         its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S.
         Without loss of generality, the data matrix can be partitioned as
         X = [A | Y], where
                 A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples
                 that belong to K classes
                 Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector
                 samples in its columns.
         If certain assumptions hold, learn a block diagonal matrix
         Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ.




Sparse and Low Rank Representations in Music Signal Analysis                           17/54
Learning Problem


 Statement
         Let X ∈ Rd×S be the data matrix that contains S vector samples in
         its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S.
         Without loss of generality, the data matrix can be partitioned as
         X = [A | Y], where
                 A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples
                 that belong to K classes
                 Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector
                 samples in its columns.
         If certain assumptions hold, learn a block diagonal matrix
         Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ.




Sparse and Low Rank Representations in Music Signal Analysis                           17/54
Learning Problem



 Assumptions
         If
              1   the data are exactly drawn from independent linear subspaces, i.e.,
                  span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K ,
              2   Y ∈ span(A),
              3   the data contain neither outliers nor noise,
         then each test vector sample that belongs to the k th class can be
         represented as a linear combination of the training samples in Ak .




Sparse and Low Rank Representations in Music Signal Analysis                              18/54
Learning Problem



 Assumptions
         If
              1   the data are exactly drawn from independent linear subspaces, i.e.,
                  span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K ,
              2   Y ∈ span(A),
              3   the data contain neither outliers nor noise,
         then each test vector sample that belongs to the k th class can be
         represented as a linear combination of the training samples in Ak .




Sparse and Low Rank Representations in Music Signal Analysis                              18/54
Learning Problem



 Assumptions
         If
              1   the data are exactly drawn from independent linear subspaces, i.e.,
                  span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K ,
              2   Y ∈ span(A),
              3   the data contain neither outliers nor noise,
         then each test vector sample that belongs to the k th class can be
         represented as a linear combination of the training samples in Ak .




Sparse and Low Rank Representations in Music Signal Analysis                              18/54
Learning Problem



 Assumptions
         If
              1   the data are exactly drawn from independent linear subspaces, i.e.,
                  span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K ,
              2   Y ∈ span(A),
              3   the data contain neither outliers nor noise,
         then each test vector sample that belongs to the k th class can be
         represented as a linear combination of the training samples in Ak .




Sparse and Low Rank Representations in Music Signal Analysis                              18/54
Learning Problem



 Assumptions
         If
              1   the data are exactly drawn from independent linear subspaces, i.e.,
                  span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K ,
              2   Y ∈ span(A),
              3   the data contain neither outliers nor noise,
         then each test vector sample that belongs to the k th class can be
         represented as a linear combination of the training samples in Ak .




Sparse and Low Rank Representations in Music Signal Analysis                              18/54
Solutions



 Sparsest Representation (SR)
 Z ∈ RN×M is the sparsest representation of the test data Y ∈ Rd×M
 with respect to the training data A ∈ Rd×N obtained by solving the
 optimization problema :

                           SR: argmin                   zi   0     subject to yi = A zi ,                        (1)
                                           zi

     a
       E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Int. Conf. Computer Vision and Pattern
 Recognition, Miami, FL, USA, 2009, pp. 2790-2797.




Sparse and Low Rank Representations in Music Signal Analysis                                                       19/54
Solutions



 Lowest-rank representation (LRR)
 or Z ∈ RN×M is the lowest-rank representation of the test data
 Y ∈ Rd×M with respect to the training data A ∈ Rd×N obtained by
 solving the optimization problema :

                           LRR: argmin rank(Z) subject to Y = A Z.   (2)
                                               Z
      a
          G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma (2011)




Sparse and Low Rank Representations in Music Signal Analysis           19/54
Solutions

 Convex relaxations
 The convex envelope of the 0 norm is the 1 norma , while the convex
 envelope of the rank function is the nuclear normb .
 Convex relaxations can be obtained by replacing the 0 norm and the
 rank function by their convex envelopes:

                             SR: argmin                zi    1       subject to yi = A zi ,                         (3)
                                         zi

                            LRR: argmin                  Z       ∗    subject to Y = A Z.                           (4)
                                              Z
     a
       D. Donoho, “For most large underdetermined systems of equations, the minimal l1-norm near-solution
 approximates the sparsest near-solution,” Communications on Pure and Applied Mathematics, vol. 59, no. 7, pp.
 907-934, 2006.
     b
       M. Fazel, Matrix Rank Minimization with Applications, Ph.D. thesis, Dept. Electrical Engineering, Stanford
 University, CA, USA, 2002.




Sparse and Low Rank Representations in Music Signal Analysis                                                          19/54
Solutions

 SR pros and cons
         The SR matrix Z ∈ RN×M is sparse block-diagonal and has good
         discriminative properties, as has been demonstrated for the SR
         based classifiersa .
         However, the SR
             1   can not model generic subspace structures. Indeed the SR models
                 accurately subregions on subspaces, the so-called bouquets, rather
                 than generic subspacesb .
             2   does not capture the global structure of the data, since it is
                 computed for each data sample individually. Indeed, although the
                 sparsity offers an efficient representation, it damages the high
                 within-class homogeneity, which is desirable for classification,
                 especially in the presence of noise.
      a
        J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE
 Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
      b
        J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.
 7, pp. 3540-3560, 2010.


Sparse and Low Rank Representations in Music Signal Analysis                                                              20/54
Solutions

 SR pros and cons
         The SR matrix Z ∈ RN×M is sparse block-diagonal and has good
         discriminative properties, as has been demonstrated for the SR
         based classifiersa .
         However, the SR
             1   can not model generic subspace structures. Indeed the SR models
                 accurately subregions on subspaces, the so-called bouquets, rather
                 than generic subspacesb .
             2   does not capture the global structure of the data, since it is
                 computed for each data sample individually. Indeed, although the
                 sparsity offers an efficient representation, it damages the high
                 within-class homogeneity, which is desirable for classification,
                 especially in the presence of noise.
      a
        J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE
 Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
      b
        J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.
 7, pp. 3540-3560, 2010.


Sparse and Low Rank Representations in Music Signal Analysis                                                              20/54
Solutions

 SR pros and cons
         The SR matrix Z ∈ RN×M is sparse block-diagonal and has good
         discriminative properties, as has been demonstrated for the SR
         based classifiersa .
         However, the SR
             1   can not model generic subspace structures. Indeed the SR models
                 accurately subregions on subspaces, the so-called bouquets, rather
                 than generic subspacesb .
             2   does not capture the global structure of the data, since it is
                 computed for each data sample individually. Indeed, although the
                 sparsity offers an efficient representation, it damages the high
                 within-class homogeneity, which is desirable for classification,
                 especially in the presence of noise.
      a
        J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE
 Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
      b
        J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.
 7, pp. 3540-3560, 2010.


Sparse and Low Rank Representations in Music Signal Analysis                                                              20/54
Solutions

 SR pros and cons
         The SR matrix Z ∈ RN×M is sparse block-diagonal and has good
         discriminative properties, as has been demonstrated for the SR
         based classifiersa .
         However, the SR
             1   can not model generic subspace structures. Indeed the SR models
                 accurately subregions on subspaces, the so-called bouquets, rather
                 than generic subspacesb .
             2   does not capture the global structure of the data, since it is
                 computed for each data sample individually. Indeed, although the
                 sparsity offers an efficient representation, it damages the high
                 within-class homogeneity, which is desirable for classification,
                 especially in the presence of noise.
      a
        J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE
 Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
      b
        J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.
 7, pp. 3540-3560, 2010.


Sparse and Low Rank Representations in Music Signal Analysis                                                              20/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
Solutions

 LRR pros and cons
         The LRR matrix Z ∈ RN×M
             1   models data steming from generic subspace structures
             2   preserves accurately the global data structure.
             3   For clean data, the LRR also exhibits dense within-class
                 homogeneity and zero between-class affinities, making it an
                 appealing representation for classification purposes, e.g., in music
                 mood classificationa .
             4   For data contaminated with noise and outliers, the low-rank
                 constraint seems to enforce noise correctionb .
         But LRR looses sparsity within the classes.
      a
        Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.
 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.
      b
        E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.
 3, pp. 1-37, 2011.




Sparse and Low Rank Representations in Music Signal Analysis                                                              21/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            22/54
Joint sparse low-rank representations (JSLRR)


 Motivation
         Intuitively, a representation matrix that is able to reveal the most
         characteristic subregions of the subspaces must be
         simultaneously row sparse and low-rank.
         The row sparsity ensures that only a small fraction of the training
         samples is involved in the representation.
         The low-rank constraint ensures that the representation vectors
         (i.e., the columns of the representation matrix) are correlated in
         the sense that the data lying onto a single subspace are
         represented as a linear combination of the same few training
         samples.




Sparse and Low Rank Representations in Music Signal Analysis                    23/54
Joint sparse low-rank representations (JSLRR)


 Motivation
         Intuitively, a representation matrix that is able to reveal the most
         characteristic subregions of the subspaces must be
         simultaneously row sparse and low-rank.
         The row sparsity ensures that only a small fraction of the training
         samples is involved in the representation.
         The low-rank constraint ensures that the representation vectors
         (i.e., the columns of the representation matrix) are correlated in
         the sense that the data lying onto a single subspace are
         represented as a linear combination of the same few training
         samples.




Sparse and Low Rank Representations in Music Signal Analysis                    23/54
Joint sparse low-rank representations (JSLRR)


 Motivation
         Intuitively, a representation matrix that is able to reveal the most
         characteristic subregions of the subspaces must be
         simultaneously row sparse and low-rank.
         The row sparsity ensures that only a small fraction of the training
         samples is involved in the representation.
         The low-rank constraint ensures that the representation vectors
         (i.e., the columns of the representation matrix) are correlated in
         the sense that the data lying onto a single subspace are
         represented as a linear combination of the same few training
         samples.




Sparse and Low Rank Representations in Music Signal Analysis                    23/54
JSLRR

 Problem statement and solution
         The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix
         Z ∈ RN×M with rank r min(q, M), where q     N is the size of the
         support of Z.
         It can be found by minimizing the rank function regularized by the
          0,q quasi-norm.
         The 0,q regularization term ensures that the low-rank matrix is
         also row sparse, since Z 0,q = |supp(Z)| for any q.
         A convex relaxation of the just mentioned problem is solved:

                   JSLRR: argmin                      Z   ∗    + θ1 Z   1   subject to Y = A Z,   (5)
                                          Z

         where the term Z 1 promotes sparsity to the LRR matrix and
         θ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis                                        24/54
JSLRR

 Problem statement and solution
         The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix
         Z ∈ RN×M with rank r min(q, M), where q     N is the size of the
         support of Z.
         It can be found by minimizing the rank function regularized by the
          0,q quasi-norm.
         The 0,q regularization term ensures that the low-rank matrix is
         also row sparse, since Z 0,q = |supp(Z)| for any q.
         A convex relaxation of the just mentioned problem is solved:

                   JSLRR: argmin                      Z   ∗    + θ1 Z   1   subject to Y = A Z,   (5)
                                          Z

         where the term Z 1 promotes sparsity to the LRR matrix and
         θ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis                                        24/54
JSLRR

 Problem statement and solution
         The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix
         Z ∈ RN×M with rank r min(q, M), where q     N is the size of the
         support of Z.
         It can be found by minimizing the rank function regularized by the
          0,q quasi-norm.
         The 0,q regularization term ensures that the low-rank matrix is
         also row sparse, since Z 0,q = |supp(Z)| for any q.
         A convex relaxation of the just mentioned problem is solved:

                   JSLRR: argmin                      Z   ∗    + θ1 Z   1   subject to Y = A Z,   (5)
                                          Z

         where the term Z 1 promotes sparsity to the LRR matrix and
         θ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis                                        24/54
JSLRR

 Problem statement and solution
         The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix
         Z ∈ RN×M with rank r min(q, M), where q     N is the size of the
         support of Z.
         It can be found by minimizing the rank function regularized by the
          0,q quasi-norm.
         The 0,q regularization term ensures that the low-rank matrix is
         also row sparse, since Z 0,q = |supp(Z)| for any q.
         A convex relaxation of the just mentioned problem is solved:

                   JSLRR: argmin                      Z   ∗    + θ1 Z   1   subject to Y = A Z,   (5)
                                          Z

         where the term Z 1 promotes sparsity to the LRR matrix and
         θ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis                                        24/54
JSLRR




 Any theoretical quarantee?
 The JSLRR has a block-diagonal structure, a property that makes it
 appealing for classification. This fact is proved in Theorem 1, which is
 a consequence of Lemma 1.




Sparse and Low Rank Representations in Music Signal Analysis               25/54
JSLRR




 Lemma 1
 Let . θ = . ∗ + θ . 1 , with θ > 0. For any four matrices B, C, D, and F
 of compatible dimensions,

                             B C                         B 0
                                                ≥                  = B   θ   + F θ.   (6)
                             D F            θ
                                                         0 F   θ




Sparse and Low Rank Representations in Music Signal Analysis                            25/54
JSLRR




 Theorem 1
 Assume that the data are exactly drawn from independent linear
 subspaces. That is, span(Ak ) linearly spans the training vectors of the
 k th class, k = 1, 2, . . . , K , and Y ∈ span(A). Then, the minimizer of (5)
 is block-diagonal.




Sparse and Low Rank Representations in Music Signal Analysis                 25/54
Example 1


 Ideal case
         4 linear pairwise independent subspaces are constructed whose
         bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3.
                     i=1
         U1 ∈ R600×110 is a column orthonormal random matrix and
         R ∈ R600×600 is a random rotation matrix.
         The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by
         picking 100 samples from each subspace. That is, Xi ∈ R600×100 ,
         i = 1, 2, 3, 4.
         Next, the data matrix is partitioned into the training matrix
         A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a
         10-fold cross validation.



Sparse and Low Rank Representations in Music Signal Analysis                  26/54
Example 1


 Ideal case
         4 linear pairwise independent subspaces are constructed whose
         bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3.
                     i=1
         U1 ∈ R600×110 is a column orthonormal random matrix and
         R ∈ R600×600 is a random rotation matrix.
         The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by
         picking 100 samples from each subspace. That is, Xi ∈ R600×100 ,
         i = 1, 2, 3, 4.
         Next, the data matrix is partitioned into the training matrix
         A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a
         10-fold cross validation.



Sparse and Low Rank Representations in Music Signal Analysis                  26/54
Example 1


 Ideal case
         4 linear pairwise independent subspaces are constructed whose
         bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3.
                     i=1
         U1 ∈ R600×110 is a column orthonormal random matrix and
         R ∈ R600×600 is a random rotation matrix.
         The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by
         picking 100 samples from each subspace. That is, Xi ∈ R600×100 ,
         i = 1, 2, 3, 4.
         Next, the data matrix is partitioned into the training matrix
         A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a
         10-fold cross validation.



Sparse and Low Rank Representations in Music Signal Analysis                  26/54
Example 1



 JSLRR, LRR, SR matrices Z ∈ R360×40




Sparse and Low Rank Representations in Music Signal Analysis   27/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            28/54
JSLRR

 Revisiting
         The data are approximately drawn from a union of subspaces. The
         deviations from the ideal assumptions can be treated collectively
         as additive noise contaminating the ideal model, i.e., Y = AZ + E.
         The noise term E models both small (but densely supported)
         deviations and grossly (but sparse) corrupted observations (i.e.,
         outliers or missing data).
         In the presence of noise, both the rank and the density of the
         representation matrix Z increases, since the columns in Z contain
         non-zero elements associated to more than one class.
         If one requests to reduce the rank of Z or to increase the sparsity
         of Z, the noise in the test set can be smoothed and Z
         simultaneously admits a close to block-diagonal structure.


Sparse and Low Rank Representations in Music Signal Analysis                   29/54
JSLRR

 Revisiting
         The data are approximately drawn from a union of subspaces. The
         deviations from the ideal assumptions can be treated collectively
         as additive noise contaminating the ideal model, i.e., Y = AZ + E.
         The noise term E models both small (but densely supported)
         deviations and grossly (but sparse) corrupted observations (i.e.,
         outliers or missing data).
         In the presence of noise, both the rank and the density of the
         representation matrix Z increases, since the columns in Z contain
         non-zero elements associated to more than one class.
         If one requests to reduce the rank of Z or to increase the sparsity
         of Z, the noise in the test set can be smoothed and Z
         simultaneously admits a close to block-diagonal structure.


Sparse and Low Rank Representations in Music Signal Analysis                   29/54
JSLRR

 Revisiting
         The data are approximately drawn from a union of subspaces. The
         deviations from the ideal assumptions can be treated collectively
         as additive noise contaminating the ideal model, i.e., Y = AZ + E.
         The noise term E models both small (but densely supported)
         deviations and grossly (but sparse) corrupted observations (i.e.,
         outliers or missing data).
         In the presence of noise, both the rank and the density of the
         representation matrix Z increases, since the columns in Z contain
         non-zero elements associated to more than one class.
         If one requests to reduce the rank of Z or to increase the sparsity
         of Z, the noise in the test set can be smoothed and Z
         simultaneously admits a close to block-diagonal structure.


Sparse and Low Rank Representations in Music Signal Analysis                   29/54
JSLRR

 Revisiting
         The data are approximately drawn from a union of subspaces. The
         deviations from the ideal assumptions can be treated collectively
         as additive noise contaminating the ideal model, i.e., Y = AZ + E.
         The noise term E models both small (but densely supported)
         deviations and grossly (but sparse) corrupted observations (i.e.,
         outliers or missing data).
         In the presence of noise, both the rank and the density of the
         representation matrix Z increases, since the columns in Z contain
         non-zero elements associated to more than one class.
         If one requests to reduce the rank of Z or to increase the sparsity
         of Z, the noise in the test set can be smoothed and Z
         simultaneously admits a close to block-diagonal structure.


Sparse and Low Rank Representations in Music Signal Analysis                   29/54
Robust JSLRR
 Optimization Problem
         A solution is sought for the convex optimization problem:

                  Robust JSLRR:                         argmin            Z    ∗   + θ1 Z       1   + θ2 E       2,1
                                                            Z,E
                                                      subject to Y = A Z + E,                                          (7)

         where θ2 > 0 is a regularization parameter and .                                           2,1   denotes the
          2 / 1 norm.
         Problem (7) can be solved iteratively by employing the Linearized
         Alternating Direction Augmented Lagrange Multiplier (LADALM)
         methoda , a variant of the Alternating Direction Augmented
         Lagrange Multiplier methodb .
      a
        J. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm
 minimization,” Math. Comput., (to appear) 2011.
      b
        D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA,
 2/e, 1996.

Sparse and Low Rank Representations in Music Signal Analysis                                                             30/54
Robust JSLRR
 Optimization Problem
         A solution is sought for the convex optimization problem:

                  Robust JSLRR:                         argmin            Z    ∗   + θ1 Z       1   + θ2 E       2,1
                                                            Z,E
                                                      subject to Y = A Z + E,                                          (7)

         where θ2 > 0 is a regularization parameter and .                                           2,1   denotes the
          2 / 1 norm.
         Problem (7) can be solved iteratively by employing the Linearized
         Alternating Direction Augmented Lagrange Multiplier (LADALM)
         methoda , a variant of the Alternating Direction Augmented
         Lagrange Multiplier methodb .
      a
        J. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm
 minimization,” Math. Comput., (to appear) 2011.
      b
        D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA,
 2/e, 1996.

Sparse and Low Rank Representations in Music Signal Analysis                                                             30/54
Robust JSLRR
 LADALM
         That is, one solves

                        argmin                  J   ∗   + θ1 W   1   + θ2 E   2,1
                         J,Z,W,E
                                           subject to Y = A Z + E, Z = J, J = W,                         (8)


         by minimizing the augmented Lagrangian function:

                  L(J, Z, W, E, Λ1 , Λ2 , Λ3 ) = J               ∗   + θ1 W   1   + θ2 E    2,1

                    +tr      ΛT (Y
                              1        − AZ − E) + tr            ΛT (Z
                                                                  2      − J) + tr      ΛT (J
                                                                                         3        − W)
                      µ                              2               2              2
                    +           Y − AZ − E           F   + Z−J       F   + J−W      F   ,                (9)
                      2
         where Λ1 , Λ2 , and Λ3 are the Lagrange multipliers and µ > 0 is a
         penalty parameter.
Sparse and Low Rank Representations in Music Signal Analysis                                               31/54
Robust JSLRR
 LADALM
         That is, one solves

                        argmin                  J   ∗   + θ1 W   1   + θ2 E   2,1
                         J,Z,W,E
                                           subject to Y = A Z + E, Z = J, J = W,                         (8)


         by minimizing the augmented Lagrangian function:

                  L(J, Z, W, E, Λ1 , Λ2 , Λ3 ) = J               ∗   + θ1 W   1   + θ2 E    2,1

                    +tr      ΛT (Y
                              1        − AZ − E) + tr            ΛT (Z
                                                                  2      − J) + tr      ΛT (J
                                                                                         3        − W)
                      µ                              2               2              2
                    +           Y − AZ − E           F   + Z−J       F   + J−W      F   ,                (9)
                      2
         where Λ1 , Λ2 , and Λ3 are the Lagrange multipliers and µ > 0 is a
         penalty parameter.
Sparse and Low Rank Representations in Music Signal Analysis                                               31/54
Robust JSLRR

 Optimization with respect to J[t]

         J[t+1]       =      argmin L(J[t] , Z[t] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] )
                                 J[t]

                                           1
                      ≈      argmin          J        ∗
                                 J[t]      µ [t]
                                   1                                                        2
                               +     J − (Z[t] − J[t] − Λ3[t] /µ + W[t] + Λ2[t] /µ)
                                   2 [t]                                                    F

         J[t+1] ← Dµ−1 Z[t] − J[t] − Λ3[t] /µ + W[t] + Λ2[t] /µ .                               (10)

 The solution is obtained via the singular value thresholding operator
 defined for any matrix Q as Dτ [Q] = USτ VT with Q = UΣVT being the
 singular value decomposition and Sτ [q] = sgn(q)max(|q| − τ, 0) being
 the shrinkage operator.

Sparse and Low Rank Representations in Music Signal Analysis                                       32/54
Robust JSLRR



 Optimization with respect to Z[t]

             Z[t+1] = argmin L(J[t+1] , Z[t] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] )
                                    Z[t]
                                                  −1
             Z[t+1] =             I + AT A

                                    AT (Y − E[t] ) + J[t+1] + (AT Λ1[t] − Λ2[t] )/µ .   (11)

 i.e., an unconstrained least squares problem.




Sparse and Low Rank Representations in Music Signal Analysis                               32/54
Robust JSLRR



 Optimization with respect to W[t]

          W[t+1]        =      argmin L(J[t+1] , Z[t+1] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] )
                                   W[t]

                                            θ1                     1                              2
                        =      argmin          W[t]       1    +     W[t] − (J[t+1] + Λ3[t] /µ)   F
                                   W[t]     µ                      2

          W[t+1] ← Sθ1 µ−1 J[t+1] + Λ3[t] /µ .                                                        (12)




Sparse and Low Rank Representations in Music Signal Analysis                                             32/54
Robust JSLRR

 Optimization with respect to E[t]

          E[t+1] = argmin L(J[t+1] , Z[t+1] , W[t+1] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] )
                                 E[t]
                                          θ2
                       = argmin              E          2,1    +
                                 E[t]     µ [t]
                                   1                                               2
                               +     E − (Y − AZ[t+1] + Λ1[t] /µ)                  F.   (13)
                                   2 [t]
 Let M[t] = Y − AZ[t+1] + Λ1[t] /µ. Update E[t+1] column-wise as follows:

                                                                           mj[t]
                                 ej[t+1] ← Sθ2 µ−1             mj[t]   2           .    (14)
                                                                           mj[t] 2


Sparse and Low Rank Representations in Music Signal Analysis                               32/54
Robust JSLRR




 Updating of Lagrange multiplier matrices

                         Λ1[t+1] = Λ1[t] + µ(Y − AZ[t+1] − E[t+1] ),
                         Λ2[t+1] = Λ2[t] + µ(Z[t+1] − J[t+1] ),
                         Λ3[t+1] = Λ3[t] + µ(J[t+1] − W[t+1] ).        (15)




Sparse and Low Rank Representations in Music Signal Analysis              32/54
Special cases
 Robust joint SR (JSR)
         The solution of the convex optimization problem is sought:

           Robust JSR: argmin                        Z   1     + θ2 E   2,1   subject to Y = A Z + E.
                                        Z,E
                                                                                                  (16)

         (16) takes into account the correlations between the test samples,
         while seeking to jointly represent the test samples from a specific
         class by a few columns of the training matrix.

                  L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E)
                                                                1
                                          µ
                   +tr ΛT (Z − J) +
                           2                  Y − AZ − E 2 + Z − J 2 ,
                                                           F          F      (17)
                                          2
         where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty
         parameter.
Sparse and Low Rank Representations in Music Signal Analysis                                            33/54
Special cases
 Robust joint SR (JSR)
         The solution of the convex optimization problem is sought:

           Robust JSR: argmin                        Z   1     + θ2 E   2,1   subject to Y = A Z + E.
                                        Z,E
                                                                                                  (16)

         (16) takes into account the correlations between the test samples,
         while seeking to jointly represent the test samples from a specific
         class by a few columns of the training matrix.

                  L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E)
                                                                1
                                          µ
                   +tr ΛT (Z − J) +
                           2                  Y − AZ − E 2 + Z − J 2 ,
                                                           F          F      (17)
                                          2
         where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty
         parameter.
Sparse and Low Rank Representations in Music Signal Analysis                                            33/54
Special cases
 Robust joint SR (JSR)
         The solution of the convex optimization problem is sought:

           Robust JSR: argmin                        Z   1     + θ2 E   2,1   subject to Y = A Z + E.
                                        Z,E
                                                                                                  (16)

         (16) takes into account the correlations between the test samples,
         while seeking to jointly represent the test samples from a specific
         class by a few columns of the training matrix.

                  L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E)
                                                                1
                                          µ
                   +tr ΛT (Z − J) +
                           2                  Y − AZ − E 2 + Z − J 2 ,
                                                           F          F      (17)
                                          2
         where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty
         parameter.
Sparse and Low Rank Representations in Music Signal Analysis                                            33/54
Special cases



 Robust LRR
         The solution of the convex optimization problem is sought:

          Robust LRR: argmin                         Z   ∗     + θ2 E   2,1   subject to Y = A Z + E.
                                        Z,E
                                                                                                 (18)

         by minimizing an augmented Lagrangian function similar to (17)
         where the first term J 1 is replaced by J ∗ .




Sparse and Low Rank Representations in Music Signal Analysis                                        34/54
Special cases



 Robust LRR
         The solution of the convex optimization problem is sought:

          Robust LRR: argmin                         Z   ∗     + θ2 E   2,1   subject to Y = A Z + E.
                                        Z,E
                                                                                                 (18)

         by minimizing an augmented Lagrangian function similar to (17)
         where the first term J 1 is replaced by J ∗ .




Sparse and Low Rank Representations in Music Signal Analysis                                        34/54
Example 2


 Noisy case
         4 linear pairwise independent subspaces are constructed as in the
         Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are
         obtained.
         We pick randomly 50 columns of A and we replace them by a
         linear combination of randomly chosen vectors from two
         subspaces with random weights. Thus, the training set is now
         contaminated by outliers.
         The 5th column of the test matrix Y is replaced by a linear
         combination of vectors not drawn from any of the 4 subspaces
         and the 15th column of Y is replaced by a vector drawn from the
         1st and the 4th subspace, as previously said.



Sparse and Low Rank Representations in Music Signal Analysis               35/54
Example 2


 Noisy case
         4 linear pairwise independent subspaces are constructed as in the
         Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are
         obtained.
         We pick randomly 50 columns of A and we replace them by a
         linear combination of randomly chosen vectors from two
         subspaces with random weights. Thus, the training set is now
         contaminated by outliers.
         The 5th column of the test matrix Y is replaced by a linear
         combination of vectors not drawn from any of the 4 subspaces
         and the 15th column of Y is replaced by a vector drawn from the
         1st and the 4th subspace, as previously said.



Sparse and Low Rank Representations in Music Signal Analysis               35/54
Example 2


 Noisy case
         4 linear pairwise independent subspaces are constructed as in the
         Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are
         obtained.
         We pick randomly 50 columns of A and we replace them by a
         linear combination of randomly chosen vectors from two
         subspaces with random weights. Thus, the training set is now
         contaminated by outliers.
         The 5th column of the test matrix Y is replaced by a linear
         combination of vectors not drawn from any of the 4 subspaces
         and the 15th column of Y is replaced by a vector drawn from the
         1st and the 4th subspace, as previously said.



Sparse and Low Rank Representations in Music Signal Analysis               35/54
Example 2
 Representation matrices (zoom in the 5th and 15th test samples)




Sparse and Low Rank Representations in Music Signal Analysis   36/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            37/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification


 Algorithm
 Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .
 Output: A class label for each column of Y.
    1    Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .
    2    for m = 1 to M
    3        ¯
             ym = ym − em .
    4           for k = 1 to K
    5                                           ¯      ¯
                      Compute the residuals rk (ym ) = ym − A δk (zm ) 2 .
    6           end for
    7                 ¯                  ¯
                class(ym ) = argmink rk (ym ).
    8    end for


Sparse and Low Rank Representations in Music Signal Analysis                 38/54
Joint sparse low-rank subspace-based classification

 Linearity concentration index
         The LCI of a coefficient vector zm ∈ RN associated to the mth test
         sample is defined as

                                      K · maxk δk (zm ) 2 / zm   2   −1
                   LCI(zm ) =                                             ∈ [0, 1].   (19)
                                                 K −1


         If LCI(zm ) = 1, the test sample is drawn from a single subspace. If
         LCI(zm ) = 0 the test sample is drawn evenly from all subspaces.
         By choosing a threshold c ∈ (0, 1), the mth test sample is claimed
         to be valid if LCI(zm ) > c. Otherwise, the test sample can be
         either rejected as totally invalid (for very small values of LCI(zm ))
         or be classified into multiple classes by assigning to it the labels
         associated to the larger values δk (zm ) 2 .

Sparse and Low Rank Representations in Music Signal Analysis                             39/54
Joint sparse low-rank subspace-based classification

 Linearity concentration index
         The LCI of a coefficient vector zm ∈ RN associated to the mth test
         sample is defined as

                                      K · maxk δk (zm ) 2 / zm   2   −1
                   LCI(zm ) =                                             ∈ [0, 1].   (19)
                                                 K −1


         If LCI(zm ) = 1, the test sample is drawn from a single subspace. If
         LCI(zm ) = 0 the test sample is drawn evenly from all subspaces.
         By choosing a threshold c ∈ (0, 1), the mth test sample is claimed
         to be valid if LCI(zm ) > c. Otherwise, the test sample can be
         either rejected as totally invalid (for very small values of LCI(zm ))
         or be classified into multiple classes by assigning to it the labels
         associated to the larger values δk (zm ) 2 .

Sparse and Low Rank Representations in Music Signal Analysis                             39/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            40/54
Music genre classification: Datasets and evaluation
procedure

 GTZAN dataset
 1000 audio recordings 30 seconds longa ;

 10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,
 Metal, Pop, Reggae, and Rock;

 Each genre class contains 100 audio recordings.

 The recordings are converted to monaural wave format at 16 kHz
 sampling rate with 16 bits and normalized, so that they have zero
 mean amplitude with unit variance.
     a
       G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio
 Processing, vol. 10, no. 5, pp. 293–302, July 2002.




Sparse and Low Rank Representations in Music Signal Analysis                                                     41/54
Music genre classification: Datasets and evaluation
procedure




 ISMIR 2004 Genre dataset
 1458 full audio recordings;

 6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),
 MetalPunk(90), RockPop(203), World (244).




Sparse and Low Rank Representations in Music Signal Analysis           41/54
Music genre classification: Datasets and evaluation
procedure



 Protocols
 GTZAN dataset: stratified 10-fold cross-validation: Each training set
 consists of 900 audio recordings yielding a training matrix AGTZAN .

 ISMIR 2004 Genre dataset: The ISMIR2004 Audio Description Contest
 protocol defines training and evaluation sets, which consist of 729
 audio files each.




Sparse and Low Rank Representations in Music Signal Analysis            41/54
Music genre classification: Datasets and evaluation
procedure

 Classifiers
 JSLRSC, the JSSC, and the LRSC;

 SRCa with the coefficients estimated by the LASSOb ;

 linear regression classifier (LRC)c ;

 the SVM with a linear kernel, and the NN classifier with the cosine
 similarity.
     a
       J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE
 Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
     b
       R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J. Royal. Statist. Soc B., vol. 58, no. 1, pp.
 267-288, 1996.
     c
       I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern
 Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106-2112, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                               41/54
Music genre classification


 Parameters θ1 > 0 and θ2 > 0




Sparse and Low Rank Representations in Music Signal Analysis   42/54
Music genre classification

 Classification accuracy
                  Dataset:               GTZAN                      ISMIR 2004 Genre
                  Classifier      Parameters Accuracy (%)       Parameters Accuracy (%)
                  JSLRSC         tuning      90.40 (3.06)      tuning           88.75
                  JSSC           tuning      88.80 (3.22)      tuning           87.51
                  LRSC           tuning      88.70 (2.79)      tuning           85.18
                  JSLRSC         θ1 = 0.2,   88.30 (2.21)      θ1 = 0.5,        86.18
                                 θ2 = 0.5,                     θ2 = 0.2,
                                 ρ = 1.1                       ρ = 1.4
                  JSSC           θ2 = 0.5,   88.00 (3.26)      θ2 = 0.5,        87.51
                                 ρ = 1.2                       ρ = 1.2
                  LRSC           θ2 = 0.2,   87.70 (2.54)      θ2 = 0.2,        84.63
                                 ρ = 1.4                       ρ = 1.4
                  SRC            -           86.50 (2.46)      -                83.95
                  LRC            -           87.30 (3.05)      -                62.41
                  SVM            -           86.20 (2.52)      -                83.25
                  NN             -           81.30 (2.79)      -                79.42




Sparse and Low Rank Representations in Music Signal Analysis                             43/54
Music genre classification
 Comparison with the state-of-the-art
             Dataset:                    GTZAN                                   ISMIR 2004 Genre
              Rank             Reference      Accuracy (%)                   Reference       Accuracy (%)
                1)            Chang et al.a      92.70                       Lee et al.b        86.83
                2)             Lee et al.        90.60                     Holzapfel et al.c    83.50
                3)          Panagakis et al.d    84.30                     Panagakis et al.     83.15
                4)           Bergstra et al.e    82.50                      Pampalk et al.      82.30
                5)           Tsunoo et al.f      77.20


      a
         K. Chang, J. S. R. Jang, and C. S. Iliopoulos, “Music genre classification via compressive sampling,” in Proc.
 11th Int. Symp. Music Information Retrieval, pp. 387-392, 2010.
      b
         C. H. Lee, J. L. Shih, K. M. Yu, and H. S. Lin, ”Automatic music genre classification based on modulation
 spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, 2009.
      c
         A. Holzapfel and Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based
 features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 424-434, February 2008.
      d
         Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of
 auditory temporal modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language
 Technology, vol. 18, no. 3, pp. 576-588, 2010.
      e
         J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Aggregate features and AdaBoost for music
 classification,” Machine Learning, vol. 65, no. 2-3, pp. 473–484, 2006.
       f
         E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama, “Beyond timbral statistics: Improving music classification
 using percussive patterns and bass lines,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 4, pp.
 1003-1014, 2011.
Sparse and Low Rank Representations in Music Signal Analysis                                                             44/54
Music genre classification
 Confusion matrices




Sparse and Low Rank Representations in Music Signal Analysis   45/54
Music genre classification


 Dimensionality reduction via random projections
         Let the true low dimensionality of the data be denoted by r . A
         random projection matrix, drawn from a normal zero-mean
         distribution, provides with high probability a stable embeddinga
         with the dimensionality of the projection d selected as the
         minimum value such that d > 2r log(7680/d ).
         r is estimated by the robust principal component analysis on a
         training set for each dataset.
         d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR
         2004 Genre dataset is found.
     a
       R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal
 recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                       46/54
Music genre classification


 Dimensionality reduction via random projections
         Let the true low dimensionality of the data be denoted by r . A
         random projection matrix, drawn from a normal zero-mean
         distribution, provides with high probability a stable embeddinga
         with the dimensionality of the projection d selected as the
         minimum value such that d > 2r log(7680/d ).
         r is estimated by the robust principal component analysis on a
         training set for each dataset.
         d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR
         2004 Genre dataset is found.
     a
       R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal
 recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                       46/54
Music genre classification


 Dimensionality reduction via random projections
         Let the true low dimensionality of the data be denoted by r . A
         random projection matrix, drawn from a normal zero-mean
         distribution, provides with high probability a stable embeddinga
         with the dimensionality of the projection d selected as the
         minimum value such that d > 2r log(7680/d ).
         r is estimated by the robust principal component analysis on a
         training set for each dataset.
         d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR
         2004 Genre dataset is found.
     a
       R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal
 recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.




Sparse and Low Rank Representations in Music Signal Analysis                                                       46/54
Music genre classification


 Accuracy after dimensionality reduction
                  Dataset:               GTZAN                      ISMIR 2004 Genre
                  Classifier      Parameters Accuracy (%)       Parameters Accuracy (%)
                  JSLRSC         θ1 = 1.8,   87.5 (2.41)       θ1 = 1.5,        85.87
                                 θ2 = 0.7 ,                    θ2 = 0.2,
                                 ρ = 1.4                       ρ = 1.4
                  JSSC           θ2 = 1.5 ,  86.9 (3.28)       θ2 = 0.6,        87.30
                                 ρ = 1.2                       ρ = 1.1
                  LRSC           θ2 = 0.7 ,  86.6 (2.75)       θ2 = 0.8 ,       84.08
                                 ρ = 1.4                       ρ = 2.4
                  SRC            -           86.90 (3.21)      -                83.67
                  LRC            -           85.30 (3.16)      -                54.18
                  SVM            -           86.00 (2.53)      -                83.26
                  NN             -           80.80 (3.01)      -                78.87




Sparse and Low Rank Representations in Music Signal Analysis                             47/54
Music genre classification

 Accuracy after rejecting 1 out of 5 test samples
 JLRSC achieves classification accuracy 95.51% in GTZAN dataset.
 For the ISMIR 2004 Genre dataset, the accuracy of the JSSC is
 92.63%, while that of the JLRSC is 91.55%.

                                 96

                                                                                                                            92
                                 94
   Classification accuracy (%)




                                                                                              Classification accuracy (%)
                                 92                                                                                         90


                                 90
                                                                                                                            88

                                 88
                                                                                                                            86
                                 86                                               JSLRSC                                                                                 JSLRSC
                                                                                  JSSC                                                                                   JSSC
                                                                                                                            84
                                                                                  LRSC                                                                                   LRSC
                                 84
                                                                                  LRC                                                                                    LRC
                                                                                  SRC                                       82                                           SRC
                                 82                                               SVM                                                                                    SVM
                                                                                  NN                                                                                     NN
                                 80                                                                                         80
                                  0.29   0.3   0.31       0.32      0.33   0.34        0.35                                  0.41   0.42   0.43   0.44   0.45   0.46   0.47   0.48
                                                      Threshold c                                                                                 Threshold c




Sparse and Low Rank Representations in Music Signal Analysis                                                                                                                         48/54
Music structure analysis



 Optimization problem
         Given a music recording of K music segments be represented by
         a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . .
         |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing
                                        1
                                 ˜                  ˜˜
         Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a
         nonnegative symmetric affinity matrix W ∈ RN×N with elements
                                                    +
         wij = mij and apply the normalized cutsa .
                2

       a
         J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine
 Intelligence, vol. 22, no. 8, pp. 888-905, 2000.




Sparse and Low Rank Representations in Music Signal Analysis                                                       49/54
Music structure analysis

 Optimization problem
         Given a music recording of K music segments be represented by
         a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . .
         |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing

                                                       λ2          2
                   argmin λ1 Z                 1   +      Z        F   subject to X = X Z, zii = 0
                        Z                              2

                                          1
                                  ˜                 ˜˜
         Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a
                                                    N×N
         nonnegative symmetric affinity matrix W ∈ R+ with elements
         wij = mij and apply the normalized cutsa .
                2

       a
         J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine
 Intelligence, vol. 22, no. 8, pp. 888-905, 2000.




Sparse and Low Rank Representations in Music Signal Analysis                                                       49/54
Music structure analysis

 Optimization problem
         Given a music recording of K music segments be represented by
         a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . .
         |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing

                                         λ2          2
         argmin λ1 Z 1 +                    Z        F +λ3       E    1   subject to X = X Z + E, zii = 0.
             Z,E                         2

                                          1
                                  ˜                 ˜˜
         Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a
                                                    N×N
         nonnegative symmetric affinity matrix W ∈ R+ with elements
         wij = mij and apply the normalized cutsa .
                2

       a
         J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine
 Intelligence, vol. 22, no. 8, pp. 888-905, 2000.




Sparse and Low Rank Representations in Music Signal Analysis                                                       49/54
Music tagging

 Optimization problem
 Assume that the tag-recording matrix Y and the matrix of the ATM
 representations X are jointly low-rank. Learn a low-rank weight matrix
 W such that:

                     argmin          W     ∗   +λ E       1    subject to Y = W X + E.   (20)
                        W,E




Sparse and Low Rank Representations in Music Signal Analysis                                50/54
Example 3




Sparse and Low Rank Representations in Music Signal Analysis   51/54
1     Introduction

 2     Auditory spectro-temporal modulations

 3     Suitable data representations for classification

 4     Joint sparse low-rank representations in the ideal case

 5     Joint sparse low-rank representations in the presence of noise

 6     Joint sparse low-rank subspace-based classification

 7     Music signal analysis

 8     Conclusions


Sparse and Low Rank Representations in Music Signal Analysis            52/54
Conclusions



 Summary-Future Work
         A robust framework for solving classification and clustering
         problems in music signal analysis has been developed.
         In all the three problems addressed, the proposed techniques
         achieve either top performance or meet the state-of-the-art.
         Efficient implementations exploiting incremental update rules are
         desparately needed.
         Performance improvement for small sample sets deserves further
         elaboration.




Sparse and Low Rank Representations in Music Signal Analysis                53/54
Sparse and Low Rank Representations in Music Signal Analysis
Sparse and Low Rank Representations in Music Signal Analysis
Sparse and Low Rank Representations in Music Signal Analysis
Sparse and Low Rank Representations in Music Signal Analysis

Mais conteúdo relacionado

Destaque

Destaque (15)

A Classification Framework For Component Models
 A Classification Framework For Component Models A Classification Framework For Component Models
A Classification Framework For Component Models
 
From Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter WorldFrom Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter World
 
Co-evolution, Games, and Social Behaviors
Co-evolution, Games, and Social BehaviorsCo-evolution, Games, and Social Behaviors
Co-evolution, Games, and Social Behaviors
 
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage SemanticWeb Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
 
Data Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database ProblemData Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database Problem
 
State Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical SystemsState Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical Systems
 
Semantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and DescriptionSemantic 3DTV Content Analysis and Description
Semantic 3DTV Content Analysis and Description
 
Jamming in Wireless Sensor Networks
Jamming in Wireless Sensor NetworksJamming in Wireless Sensor Networks
Jamming in Wireless Sensor Networks
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Sparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and ApplicationsSparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and Applications
 
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging  Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging
 
Artificial Intelligence and Human Thinking
Artificial Intelligence and Human ThinkingArtificial Intelligence and Human Thinking
Artificial Intelligence and Human Thinking
 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
 

Semelhante a Sparse and Low Rank Representations in Music Signal Analysis

Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
 
Tervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTommi Himberg
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotionsPranay Prasoon
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
 
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
 
Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Waqas Tariq
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based RankingAndrea Gazzarini
 
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...Journal Club - "Intermediate acoustic-to-semantic representations link behavi...
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...Ana Luísa Pinho
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsIJMTST Journal
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
 
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSIS
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSISAN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSIS
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSISijscai
 

Semelhante a Sparse and Low Rank Representations in Music Signal Analysis (20)

Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Tervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTervo: Sensory Dissonance Models
Tervo: Sensory Dissonance Models
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
 
Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
 
Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...Journal Club - "Intermediate acoustic-to-semantic representations link behavi...
Journal Club - "Intermediate acoustic-to-semantic representations link behavi...
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
MIR
MIRMIR
MIR
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A Review
 
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
T26123129
T26123129T26123129
T26123129
 
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSIS
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSISAN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSIS
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSIS
 

Último

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Último (20)

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 

Sparse and Low Rank Representations in Music Signal Analysis

  • 1. Sparse and Low Rank Representations in Music Signal Analysis Constantine Kotropoulos, Yannis Panagakis Artificial Intelligence & Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, GREECE 2nd Greek Signal Processing Jam, Thessaloniki, May 17th, 2012 Sparse and Low Rank Representations in Music Signal Analysis 1/54
  • 2. Outline 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 2/54
  • 3. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 3/54
  • 4. Introduction Music genre classification Genre: the most popular description of music content despite the lack of a commonly agreed definition. To classify music recordings into distinguishable genres using information extracted from the audio signal. Musical structure analysis To derive the musical form, i.e., the structural description of a music piece at the time scale of segments, such as intro, verse, chorus, bridge, from the audio signal. Music tagging Tags: text-based labels encoding semantic information related to music (i.e., instrumentation, genres, emotions, etc.). Manual tagging (expensive, time consuming, applicable to popular music); Automatic tagging (fast, applied to new and unpopular music). Sparse and Low Rank Representations in Music Signal Analysis 4/54
  • 5. Introduction Music genre classification Genre: the most popular description of music content despite the lack of a commonly agreed definition. To classify music recordings into distinguishable genres using information extracted from the audio signal. Musical structure analysis To derive the musical form, i.e., the structural description of a music piece at the time scale of segments, such as intro, verse, chorus, bridge, from the audio signal. Music tagging Tags: text-based labels encoding semantic information related to music (i.e., instrumentation, genres, emotions, etc.). Manual tagging (expensive, time consuming, applicable to popular music); Automatic tagging (fast, applied to new and unpopular music). Sparse and Low Rank Representations in Music Signal Analysis 4/54
  • 6. Introduction Music genre classification Genre: the most popular description of music content despite the lack of a commonly agreed definition. To classify music recordings into distinguishable genres using information extracted from the audio signal. Musical structure analysis To derive the musical form, i.e., the structural description of a music piece at the time scale of segments, such as intro, verse, chorus, bridge, from the audio signal. Music tagging Tags: text-based labels encoding semantic information related to music (i.e., instrumentation, genres, emotions, etc.). Manual tagging (expensive, time consuming, applicable to popular music); Automatic tagging (fast, applied to new and unpopular music). Sparse and Low Rank Representations in Music Signal Analysis 4/54
  • 7. Introduction Motivation The appealing properties of slow temporal and spectro-temporal modulations from the human perceptual point of viewa ; The strong theoretical foundations of sparse representationsbc and low-rank representationsd . a K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 5, pp. 382–396, 1995. b ` E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006. c D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. d G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation, IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint). Sparse and Low Rank Representations in Music Signal Analysis 5/54
  • 8. Introduction Motivation The appealing properties of slow temporal and spectro-temporal modulations from the human perceptual point of viewa ; The strong theoretical foundations of sparse representationsbc and low-rank representationsd . a K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 5, pp. 382–396, 1995. b ` E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006. c D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. d G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation, IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint). Sparse and Low Rank Representations in Music Signal Analysis 5/54
  • 9. Notations Span Let span(X) denote the linear space spanned by the columns of X. Then, Y ∈ span(X) denotes that all column vectors of Y belong to span(X). Sparse and Low Rank Representations in Music Signal Analysis 6/54
  • 10. Notations Vector norms x 0 is 0 quasi-norm counting the number of nonzero entries in x. If |.| denotes the absolute value operator, x 1 = i |xi | and x 2 norm of x, respectively. 2 = i xi are the 1 and the 2 Sparse and Low Rank Representations in Music Signal Analysis 6/54
  • 11. Notations Matrix norms 1 q q p mixed p,q matrix norm: X p,q = i j |xij |p . For p = q = 0, the matrix 0 quasi-norm, X 0 , returns the number of nonzero entries in X. For p = q = 1, the matrix 1 norm is obtained X 1 = i j |xij |. Frobenius norm: X = 2 xij . F i j norm of X: X 2 2/ 1 2,1 = j i xij . nuclear norm of X, X ∗ , is the sum of singular values of X. Sparse and Low Rank Representations in Music Signal Analysis 6/54
  • 12. Notations Support A vector x is said to be q-sparse if the size of the support of x (i.e., the set of indices associated to non-zero vector elements) is no larger than q. The support of a collection of vectors X = [x1 , x2 , . . . , xN ] is defined as the union over all the individual supports. A matrix X is called q joint sparse, if |supp(X)| ≤ q. That is, there are at most q rows in X that contain nonzero elements, because X 0,q = |supp(X)| for any q a . a M. Davies and Y. Eldar, “Rank awareness in joint sparse recovery,” arXiv:1004.4529v1, 2010. Sparse and Low Rank Representations in Music Signal Analysis 6/54
  • 13. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 7/54
  • 14. Auditory spectro-temporal modulations Computational auditory model It is inspired by psychoacoustical and neurophysiological investigations in the early and central stages of the human auditory system. Auditory Spectro-Temporal Modulations (Cortical Representation) Auditory Spectrogram Central auditory model Early auditory model Auditory Temporal Modulations Sparse and Low Rank Representations in Music Signal Analysis 8/54
  • 15. Auditory spectro-temporal modulations Computational auditory model It is inspired by psychoacoustical and neurophysiological investigations in the early and central stages of the human auditory system. Auditory Spectro-Temporal Modulations (Cortical Representation) Auditory Spectrogram Central auditory model Early auditory model Auditory Temporal Modulations Sparse and Low Rank Representations in Music Signal Analysis 8/54
  • 16. Auditory spectro-temporal modulations Early auditory system Auditory Spectrogram: time-frequency distribution of energy along a tonotopic (logarithmic frequency) axis. Auditory Spectrogram Early auditory model Sparse and Low Rank Representations in Music Signal Analysis 9/54
  • 17. Auditory spectro-temporal modulations Central auditory system - Temporal modulations Auditory Spectrogram Auditory Temporal Modulations z) (H ω z) (H ω Sparse and Low Rank Representations in Music Signal Analysis 10/54
  • 18. Auditory spectro-temporal modulations Auditory temporal modulations across 10 music genres Blues Classical Country Disco Hiphop Jazz Metal Pop Reggae Rock Sparse and Low Rank Representations in Music Signal Analysis 11/54
  • 19. Auditory spectro-temporal modulations Central auditory system - Spectro-temporal modulations Auditory Spectrogram Auditory Spectro-Temporal Modulations Ω(c/o) z) (H ω Sparse and Low Rank Representations in Music Signal Analysis 12/54
  • 20. Auditory spectro-temporal modulations Efficient implementation through constant Q transform (CQT) Sparse and Low Rank Representations in Music Signal Analysis 13/54
  • 21. Auditory spectro-temporal modulations Parameters and implementation (1) The audio signal is analyzed by employing 128 constant-Q filters covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per octave). The magnitude of the CQT is compressed by raising each element of the CQT matrix to the power of 0.1a . The 2D multiresolution wavelet analysis is implemented via a bank of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8} (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz). For each music recording, the extracted 4D cortical representation is time- averaged and the resulting rate-scale-frequency 3D cortical representation is thus obtained. a C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music Computing Conf., Barcelona, Spain, 2010. Sparse and Low Rank Representations in Music Signal Analysis 14/54
  • 22. Auditory spectro-temporal modulations Parameters and implementation (1) The audio signal is analyzed by employing 128 constant-Q filters covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per octave). The magnitude of the CQT is compressed by raising each element of the CQT matrix to the power of 0.1a . The 2D multiresolution wavelet analysis is implemented via a bank of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8} (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz). For each music recording, the extracted 4D cortical representation is time- averaged and the resulting rate-scale-frequency 3D cortical representation is thus obtained. a C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music Computing Conf., Barcelona, Spain, 2010. Sparse and Low Rank Representations in Music Signal Analysis 14/54
  • 23. Auditory spectro-temporal modulations Parameters and implementation (1) The audio signal is analyzed by employing 128 constant-Q filters covering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters per octave). The magnitude of the CQT is compressed by raising each element of the CQT matrix to the power of 0.1a . The 2D multiresolution wavelet analysis is implemented via a bank of 2D Gaussian filters with scales ∈ {0.25, 0.5, 1, 2, 4, 8} (Cycles/Octave) and rates ∈ {±2, ±4, ±8, ±16, ±32} (Hz). For each music recording, the extracted 4D cortical representation is time- averaged and the resulting rate-scale-frequency 3D cortical representation is thus obtained. a C. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music Computing Conf., Barcelona, Spain, 2010. Sparse and Low Rank Representations in Music Signal Analysis 14/54
  • 24. Auditory spectro-temporal modulations Parameters and implementation (2) To sum up, each music recording is represented by a vector x ∈ R7680 by stacking the elements of the 3D cortical + representation into a vector. An ensemble of music recordings is represented by the data matrix X ∈ R7680×S , where S is the number of the available + recordings. Each row of X is normalized to the range [0, 1] by subtracting from each entry the row minimum and then by dividing it with the difference between the row maximum and the row minimum. Sparse and Low Rank Representations in Music Signal Analysis 15/54
  • 25. Auditory spectro-temporal modulations Parameters and implementation (2) To sum up, each music recording is represented by a vector x ∈ R7680 by stacking the elements of the 3D cortical + representation into a vector. An ensemble of music recordings is represented by the data matrix X ∈ R7680×S , where S is the number of the available + recordings. Each row of X is normalized to the range [0, 1] by subtracting from each entry the row minimum and then by dividing it with the difference between the row maximum and the row minimum. Sparse and Low Rank Representations in Music Signal Analysis 15/54
  • 26. Auditory spectro-temporal modulations Parameters and implementation (2) To sum up, each music recording is represented by a vector x ∈ R7680 by stacking the elements of the 3D cortical + representation into a vector. An ensemble of music recordings is represented by the data matrix X ∈ R7680×S , where S is the number of the available + recordings. Each row of X is normalized to the range [0, 1] by subtracting from each entry the row minimum and then by dividing it with the difference between the row maximum and the row minimum. Sparse and Low Rank Representations in Music Signal Analysis 15/54
  • 27. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 16/54
  • 28. Learning Problem Statement Let X ∈ Rd×S be the data matrix that contains S vector samples in its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S. Without loss of generality, the data matrix can be partitioned as X = [A | Y], where A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples that belong to K classes Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector samples in its columns. If certain assumptions hold, learn a block diagonal matrix Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ. Sparse and Low Rank Representations in Music Signal Analysis 17/54
  • 29. Learning Problem Statement Let X ∈ Rd×S be the data matrix that contains S vector samples in its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S. Without loss of generality, the data matrix can be partitioned as X = [A | Y], where A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples that belong to K classes Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector samples in its columns. If certain assumptions hold, learn a block diagonal matrix Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ. Sparse and Low Rank Representations in Music Signal Analysis 17/54
  • 30. Learning Problem Statement Let X ∈ Rd×S be the data matrix that contains S vector samples in its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S. Without loss of generality, the data matrix can be partitioned as X = [A | Y], where A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples that belong to K classes Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector samples in its columns. If certain assumptions hold, learn a block diagonal matrix Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ. Sparse and Low Rank Representations in Music Signal Analysis 17/54
  • 31. Learning Problem Statement Let X ∈ Rd×S be the data matrix that contains S vector samples in its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S. Without loss of generality, the data matrix can be partitioned as X = [A | Y], where A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples that belong to K classes Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector samples in its columns. If certain assumptions hold, learn a block diagonal matrix Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ. Sparse and Low Rank Representations in Music Signal Analysis 17/54
  • 32. Learning Problem Statement Let X ∈ Rd×S be the data matrix that contains S vector samples in its columns of size d. That is, xs ∈ Rd , s = 1, 2, . . . , S. Without loss of generality, the data matrix can be partitioned as X = [A | Y], where A = [A1 |A2 | . . . |AK ] ∈ Rd×N represents a set of N training samples that belong to K classes Y = [Y1 |Y2 | . . . |YK ] ∈ Rd×M contains M = S − N test vector samples in its columns. If certain assumptions hold, learn a block diagonal matrix Z = diag[Z1 , Z2 , . . . , ZK ] ∈ RN×M such that Y = AZ. Sparse and Low Rank Representations in Music Signal Analysis 17/54
  • 33. Learning Problem Assumptions If 1 the data are exactly drawn from independent linear subspaces, i.e., span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K , 2 Y ∈ span(A), 3 the data contain neither outliers nor noise, then each test vector sample that belongs to the k th class can be represented as a linear combination of the training samples in Ak . Sparse and Low Rank Representations in Music Signal Analysis 18/54
  • 34. Learning Problem Assumptions If 1 the data are exactly drawn from independent linear subspaces, i.e., span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K , 2 Y ∈ span(A), 3 the data contain neither outliers nor noise, then each test vector sample that belongs to the k th class can be represented as a linear combination of the training samples in Ak . Sparse and Low Rank Representations in Music Signal Analysis 18/54
  • 35. Learning Problem Assumptions If 1 the data are exactly drawn from independent linear subspaces, i.e., span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K , 2 Y ∈ span(A), 3 the data contain neither outliers nor noise, then each test vector sample that belongs to the k th class can be represented as a linear combination of the training samples in Ak . Sparse and Low Rank Representations in Music Signal Analysis 18/54
  • 36. Learning Problem Assumptions If 1 the data are exactly drawn from independent linear subspaces, i.e., span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K , 2 Y ∈ span(A), 3 the data contain neither outliers nor noise, then each test vector sample that belongs to the k th class can be represented as a linear combination of the training samples in Ak . Sparse and Low Rank Representations in Music Signal Analysis 18/54
  • 37. Learning Problem Assumptions If 1 the data are exactly drawn from independent linear subspaces, i.e., span(Ak ) linearly spans the k th class data space, k = 1, 2, . . . , K , 2 Y ∈ span(A), 3 the data contain neither outliers nor noise, then each test vector sample that belongs to the k th class can be represented as a linear combination of the training samples in Ak . Sparse and Low Rank Representations in Music Signal Analysis 18/54
  • 38. Solutions Sparsest Representation (SR) Z ∈ RN×M is the sparsest representation of the test data Y ∈ Rd×M with respect to the training data A ∈ Rd×N obtained by solving the optimization problema : SR: argmin zi 0 subject to yi = A zi , (1) zi a E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Int. Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 2790-2797. Sparse and Low Rank Representations in Music Signal Analysis 19/54
  • 39. Solutions Lowest-rank representation (LRR) or Z ∈ RN×M is the lowest-rank representation of the test data Y ∈ Rd×M with respect to the training data A ∈ Rd×N obtained by solving the optimization problema : LRR: argmin rank(Z) subject to Y = A Z. (2) Z a G. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma (2011) Sparse and Low Rank Representations in Music Signal Analysis 19/54
  • 40. Solutions Convex relaxations The convex envelope of the 0 norm is the 1 norma , while the convex envelope of the rank function is the nuclear normb . Convex relaxations can be obtained by replacing the 0 norm and the rank function by their convex envelopes: SR: argmin zi 1 subject to yi = A zi , (3) zi LRR: argmin Z ∗ subject to Y = A Z. (4) Z a D. Donoho, “For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution,” Communications on Pure and Applied Mathematics, vol. 59, no. 7, pp. 907-934, 2006. b M. Fazel, Matrix Rank Minimization with Applications, Ph.D. thesis, Dept. Electrical Engineering, Stanford University, CA, USA, 2002. Sparse and Low Rank Representations in Music Signal Analysis 19/54
  • 41. Solutions SR pros and cons The SR matrix Z ∈ RN×M is sparse block-diagonal and has good discriminative properties, as has been demonstrated for the SR based classifiersa . However, the SR 1 can not model generic subspace structures. Indeed the SR models accurately subregions on subspaces, the so-called bouquets, rather than generic subspacesb . 2 does not capture the global structure of the data, since it is computed for each data sample individually. Indeed, although the sparsity offers an efficient representation, it damages the high within-class homogeneity, which is desirable for classification, especially in the presence of noise. a J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. b J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3540-3560, 2010. Sparse and Low Rank Representations in Music Signal Analysis 20/54
  • 42. Solutions SR pros and cons The SR matrix Z ∈ RN×M is sparse block-diagonal and has good discriminative properties, as has been demonstrated for the SR based classifiersa . However, the SR 1 can not model generic subspace structures. Indeed the SR models accurately subregions on subspaces, the so-called bouquets, rather than generic subspacesb . 2 does not capture the global structure of the data, since it is computed for each data sample individually. Indeed, although the sparsity offers an efficient representation, it damages the high within-class homogeneity, which is desirable for classification, especially in the presence of noise. a J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. b J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3540-3560, 2010. Sparse and Low Rank Representations in Music Signal Analysis 20/54
  • 43. Solutions SR pros and cons The SR matrix Z ∈ RN×M is sparse block-diagonal and has good discriminative properties, as has been demonstrated for the SR based classifiersa . However, the SR 1 can not model generic subspace structures. Indeed the SR models accurately subregions on subspaces, the so-called bouquets, rather than generic subspacesb . 2 does not capture the global structure of the data, since it is computed for each data sample individually. Indeed, although the sparsity offers an efficient representation, it damages the high within-class homogeneity, which is desirable for classification, especially in the presence of noise. a J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. b J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3540-3560, 2010. Sparse and Low Rank Representations in Music Signal Analysis 20/54
  • 44. Solutions SR pros and cons The SR matrix Z ∈ RN×M is sparse block-diagonal and has good discriminative properties, as has been demonstrated for the SR based classifiersa . However, the SR 1 can not model generic subspace structures. Indeed the SR models accurately subregions on subspaces, the so-called bouquets, rather than generic subspacesb . 2 does not capture the global structure of the data, since it is computed for each data sample individually. Indeed, although the sparsity offers an efficient representation, it damages the high within-class homogeneity, which is desirable for classification, especially in the presence of noise. a J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. b J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3540-3560, 2010. Sparse and Low Rank Representations in Music Signal Analysis 20/54
  • 45. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 46. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 47. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 48. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 49. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 50. Solutions LRR pros and cons The LRR matrix Z ∈ RN×M 1 models data steming from generic subspace structures 2 preserves accurately the global data structure. 3 For clean data, the LRR also exhibits dense within-class homogeneity and zero between-class affinities, making it an appealing representation for classification purposes, e.g., in music mood classificationa . 4 For data contaminated with noise and outliers, the low-rank constraint seems to enforce noise correctionb . But LRR looses sparsity within the classes. a Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc. 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693. b E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, pp. 1-37, 2011. Sparse and Low Rank Representations in Music Signal Analysis 21/54
  • 51. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 22/54
  • 52. Joint sparse low-rank representations (JSLRR) Motivation Intuitively, a representation matrix that is able to reveal the most characteristic subregions of the subspaces must be simultaneously row sparse and low-rank. The row sparsity ensures that only a small fraction of the training samples is involved in the representation. The low-rank constraint ensures that the representation vectors (i.e., the columns of the representation matrix) are correlated in the sense that the data lying onto a single subspace are represented as a linear combination of the same few training samples. Sparse and Low Rank Representations in Music Signal Analysis 23/54
  • 53. Joint sparse low-rank representations (JSLRR) Motivation Intuitively, a representation matrix that is able to reveal the most characteristic subregions of the subspaces must be simultaneously row sparse and low-rank. The row sparsity ensures that only a small fraction of the training samples is involved in the representation. The low-rank constraint ensures that the representation vectors (i.e., the columns of the representation matrix) are correlated in the sense that the data lying onto a single subspace are represented as a linear combination of the same few training samples. Sparse and Low Rank Representations in Music Signal Analysis 23/54
  • 54. Joint sparse low-rank representations (JSLRR) Motivation Intuitively, a representation matrix that is able to reveal the most characteristic subregions of the subspaces must be simultaneously row sparse and low-rank. The row sparsity ensures that only a small fraction of the training samples is involved in the representation. The low-rank constraint ensures that the representation vectors (i.e., the columns of the representation matrix) are correlated in the sense that the data lying onto a single subspace are represented as a linear combination of the same few training samples. Sparse and Low Rank Representations in Music Signal Analysis 23/54
  • 55. JSLRR Problem statement and solution The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix Z ∈ RN×M with rank r min(q, M), where q N is the size of the support of Z. It can be found by minimizing the rank function regularized by the 0,q quasi-norm. The 0,q regularization term ensures that the low-rank matrix is also row sparse, since Z 0,q = |supp(Z)| for any q. A convex relaxation of the just mentioned problem is solved: JSLRR: argmin Z ∗ + θ1 Z 1 subject to Y = A Z, (5) Z where the term Z 1 promotes sparsity to the LRR matrix and θ1 > 0 balances the two norms in (5). Sparse and Low Rank Representations in Music Signal Analysis 24/54
  • 56. JSLRR Problem statement and solution The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix Z ∈ RN×M with rank r min(q, M), where q N is the size of the support of Z. It can be found by minimizing the rank function regularized by the 0,q quasi-norm. The 0,q regularization term ensures that the low-rank matrix is also row sparse, since Z 0,q = |supp(Z)| for any q. A convex relaxation of the just mentioned problem is solved: JSLRR: argmin Z ∗ + θ1 Z 1 subject to Y = A Z, (5) Z where the term Z 1 promotes sparsity to the LRR matrix and θ1 > 0 balances the two norms in (5). Sparse and Low Rank Representations in Music Signal Analysis 24/54
  • 57. JSLRR Problem statement and solution The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix Z ∈ RN×M with rank r min(q, M), where q N is the size of the support of Z. It can be found by minimizing the rank function regularized by the 0,q quasi-norm. The 0,q regularization term ensures that the low-rank matrix is also row sparse, since Z 0,q = |supp(Z)| for any q. A convex relaxation of the just mentioned problem is solved: JSLRR: argmin Z ∗ + θ1 Z 1 subject to Y = A Z, (5) Z where the term Z 1 promotes sparsity to the LRR matrix and θ1 > 0 balances the two norms in (5). Sparse and Low Rank Representations in Music Signal Analysis 24/54
  • 58. JSLRR Problem statement and solution The JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrix Z ∈ RN×M with rank r min(q, M), where q N is the size of the support of Z. It can be found by minimizing the rank function regularized by the 0,q quasi-norm. The 0,q regularization term ensures that the low-rank matrix is also row sparse, since Z 0,q = |supp(Z)| for any q. A convex relaxation of the just mentioned problem is solved: JSLRR: argmin Z ∗ + θ1 Z 1 subject to Y = A Z, (5) Z where the term Z 1 promotes sparsity to the LRR matrix and θ1 > 0 balances the two norms in (5). Sparse and Low Rank Representations in Music Signal Analysis 24/54
  • 59. JSLRR Any theoretical quarantee? The JSLRR has a block-diagonal structure, a property that makes it appealing for classification. This fact is proved in Theorem 1, which is a consequence of Lemma 1. Sparse and Low Rank Representations in Music Signal Analysis 25/54
  • 60. JSLRR Lemma 1 Let . θ = . ∗ + θ . 1 , with θ > 0. For any four matrices B, C, D, and F of compatible dimensions, B C B 0 ≥ = B θ + F θ. (6) D F θ 0 F θ Sparse and Low Rank Representations in Music Signal Analysis 25/54
  • 61. JSLRR Theorem 1 Assume that the data are exactly drawn from independent linear subspaces. That is, span(Ak ) linearly spans the training vectors of the k th class, k = 1, 2, . . . , K , and Y ∈ span(A). Then, the minimizer of (5) is block-diagonal. Sparse and Low Rank Representations in Music Signal Analysis 25/54
  • 62. Example 1 Ideal case 4 linear pairwise independent subspaces are constructed whose bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3. i=1 U1 ∈ R600×110 is a column orthonormal random matrix and R ∈ R600×600 is a random rotation matrix. The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by picking 100 samples from each subspace. That is, Xi ∈ R600×100 , i = 1, 2, 3, 4. Next, the data matrix is partitioned into the training matrix A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a 10-fold cross validation. Sparse and Low Rank Representations in Music Signal Analysis 26/54
  • 63. Example 1 Ideal case 4 linear pairwise independent subspaces are constructed whose bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3. i=1 U1 ∈ R600×110 is a column orthonormal random matrix and R ∈ R600×600 is a random rotation matrix. The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by picking 100 samples from each subspace. That is, Xi ∈ R600×100 , i = 1, 2, 3, 4. Next, the data matrix is partitioned into the training matrix A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a 10-fold cross validation. Sparse and Low Rank Representations in Music Signal Analysis 26/54
  • 64. Example 1 Ideal case 4 linear pairwise independent subspaces are constructed whose bases {Ui }4 are computed by Ui+1 = RUi , i = 1, 2, 3. i=1 U1 ∈ R600×110 is a column orthonormal random matrix and R ∈ R600×600 is a random rotation matrix. The data matrix X = [X1 , X2 , X3 , X4 ] ∈ R600×400 is obtained by picking 100 samples from each subspace. That is, Xi ∈ R600×100 , i = 1, 2, 3, 4. Next, the data matrix is partitioned into the training matrix A ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a 10-fold cross validation. Sparse and Low Rank Representations in Music Signal Analysis 26/54
  • 65. Example 1 JSLRR, LRR, SR matrices Z ∈ R360×40 Sparse and Low Rank Representations in Music Signal Analysis 27/54
  • 66. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 28/54
  • 67. JSLRR Revisiting The data are approximately drawn from a union of subspaces. The deviations from the ideal assumptions can be treated collectively as additive noise contaminating the ideal model, i.e., Y = AZ + E. The noise term E models both small (but densely supported) deviations and grossly (but sparse) corrupted observations (i.e., outliers or missing data). In the presence of noise, both the rank and the density of the representation matrix Z increases, since the columns in Z contain non-zero elements associated to more than one class. If one requests to reduce the rank of Z or to increase the sparsity of Z, the noise in the test set can be smoothed and Z simultaneously admits a close to block-diagonal structure. Sparse and Low Rank Representations in Music Signal Analysis 29/54
  • 68. JSLRR Revisiting The data are approximately drawn from a union of subspaces. The deviations from the ideal assumptions can be treated collectively as additive noise contaminating the ideal model, i.e., Y = AZ + E. The noise term E models both small (but densely supported) deviations and grossly (but sparse) corrupted observations (i.e., outliers or missing data). In the presence of noise, both the rank and the density of the representation matrix Z increases, since the columns in Z contain non-zero elements associated to more than one class. If one requests to reduce the rank of Z or to increase the sparsity of Z, the noise in the test set can be smoothed and Z simultaneously admits a close to block-diagonal structure. Sparse and Low Rank Representations in Music Signal Analysis 29/54
  • 69. JSLRR Revisiting The data are approximately drawn from a union of subspaces. The deviations from the ideal assumptions can be treated collectively as additive noise contaminating the ideal model, i.e., Y = AZ + E. The noise term E models both small (but densely supported) deviations and grossly (but sparse) corrupted observations (i.e., outliers or missing data). In the presence of noise, both the rank and the density of the representation matrix Z increases, since the columns in Z contain non-zero elements associated to more than one class. If one requests to reduce the rank of Z or to increase the sparsity of Z, the noise in the test set can be smoothed and Z simultaneously admits a close to block-diagonal structure. Sparse and Low Rank Representations in Music Signal Analysis 29/54
  • 70. JSLRR Revisiting The data are approximately drawn from a union of subspaces. The deviations from the ideal assumptions can be treated collectively as additive noise contaminating the ideal model, i.e., Y = AZ + E. The noise term E models both small (but densely supported) deviations and grossly (but sparse) corrupted observations (i.e., outliers or missing data). In the presence of noise, both the rank and the density of the representation matrix Z increases, since the columns in Z contain non-zero elements associated to more than one class. If one requests to reduce the rank of Z or to increase the sparsity of Z, the noise in the test set can be smoothed and Z simultaneously admits a close to block-diagonal structure. Sparse and Low Rank Representations in Music Signal Analysis 29/54
  • 71. Robust JSLRR Optimization Problem A solution is sought for the convex optimization problem: Robust JSLRR: argmin Z ∗ + θ1 Z 1 + θ2 E 2,1 Z,E subject to Y = A Z + E, (7) where θ2 > 0 is a regularization parameter and . 2,1 denotes the 2 / 1 norm. Problem (7) can be solved iteratively by employing the Linearized Alternating Direction Augmented Lagrange Multiplier (LADALM) methoda , a variant of the Alternating Direction Augmented Lagrange Multiplier methodb . a J. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,” Math. Comput., (to appear) 2011. b D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA, 2/e, 1996. Sparse and Low Rank Representations in Music Signal Analysis 30/54
  • 72. Robust JSLRR Optimization Problem A solution is sought for the convex optimization problem: Robust JSLRR: argmin Z ∗ + θ1 Z 1 + θ2 E 2,1 Z,E subject to Y = A Z + E, (7) where θ2 > 0 is a regularization parameter and . 2,1 denotes the 2 / 1 norm. Problem (7) can be solved iteratively by employing the Linearized Alternating Direction Augmented Lagrange Multiplier (LADALM) methoda , a variant of the Alternating Direction Augmented Lagrange Multiplier methodb . a J. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,” Math. Comput., (to appear) 2011. b D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA, 2/e, 1996. Sparse and Low Rank Representations in Music Signal Analysis 30/54
  • 73. Robust JSLRR LADALM That is, one solves argmin J ∗ + θ1 W 1 + θ2 E 2,1 J,Z,W,E subject to Y = A Z + E, Z = J, J = W, (8) by minimizing the augmented Lagrangian function: L(J, Z, W, E, Λ1 , Λ2 , Λ3 ) = J ∗ + θ1 W 1 + θ2 E 2,1 +tr ΛT (Y 1 − AZ − E) + tr ΛT (Z 2 − J) + tr ΛT (J 3 − W) µ 2 2 2 + Y − AZ − E F + Z−J F + J−W F , (9) 2 where Λ1 , Λ2 , and Λ3 are the Lagrange multipliers and µ > 0 is a penalty parameter. Sparse and Low Rank Representations in Music Signal Analysis 31/54
  • 74. Robust JSLRR LADALM That is, one solves argmin J ∗ + θ1 W 1 + θ2 E 2,1 J,Z,W,E subject to Y = A Z + E, Z = J, J = W, (8) by minimizing the augmented Lagrangian function: L(J, Z, W, E, Λ1 , Λ2 , Λ3 ) = J ∗ + θ1 W 1 + θ2 E 2,1 +tr ΛT (Y 1 − AZ − E) + tr ΛT (Z 2 − J) + tr ΛT (J 3 − W) µ 2 2 2 + Y − AZ − E F + Z−J F + J−W F , (9) 2 where Λ1 , Λ2 , and Λ3 are the Lagrange multipliers and µ > 0 is a penalty parameter. Sparse and Low Rank Representations in Music Signal Analysis 31/54
  • 75. Robust JSLRR Optimization with respect to J[t] J[t+1] = argmin L(J[t] , Z[t] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] ) J[t] 1 ≈ argmin J ∗ J[t] µ [t] 1 2 + J − (Z[t] − J[t] − Λ3[t] /µ + W[t] + Λ2[t] /µ) 2 [t] F J[t+1] ← Dµ−1 Z[t] − J[t] − Λ3[t] /µ + W[t] + Λ2[t] /µ . (10) The solution is obtained via the singular value thresholding operator defined for any matrix Q as Dτ [Q] = USτ VT with Q = UΣVT being the singular value decomposition and Sτ [q] = sgn(q)max(|q| − τ, 0) being the shrinkage operator. Sparse and Low Rank Representations in Music Signal Analysis 32/54
  • 76. Robust JSLRR Optimization with respect to Z[t] Z[t+1] = argmin L(J[t+1] , Z[t] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] ) Z[t] −1 Z[t+1] = I + AT A AT (Y − E[t] ) + J[t+1] + (AT Λ1[t] − Λ2[t] )/µ . (11) i.e., an unconstrained least squares problem. Sparse and Low Rank Representations in Music Signal Analysis 32/54
  • 77. Robust JSLRR Optimization with respect to W[t] W[t+1] = argmin L(J[t+1] , Z[t+1] , W[t] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] ) W[t] θ1 1 2 = argmin W[t] 1 + W[t] − (J[t+1] + Λ3[t] /µ) F W[t] µ 2 W[t+1] ← Sθ1 µ−1 J[t+1] + Λ3[t] /µ . (12) Sparse and Low Rank Representations in Music Signal Analysis 32/54
  • 78. Robust JSLRR Optimization with respect to E[t] E[t+1] = argmin L(J[t+1] , Z[t+1] , W[t+1] , E[t] , Λ1[t] , Λ2[t] , Λ3[t] ) E[t] θ2 = argmin E 2,1 + E[t] µ [t] 1 2 + E − (Y − AZ[t+1] + Λ1[t] /µ) F. (13) 2 [t] Let M[t] = Y − AZ[t+1] + Λ1[t] /µ. Update E[t+1] column-wise as follows: mj[t] ej[t+1] ← Sθ2 µ−1 mj[t] 2 . (14) mj[t] 2 Sparse and Low Rank Representations in Music Signal Analysis 32/54
  • 79. Robust JSLRR Updating of Lagrange multiplier matrices Λ1[t+1] = Λ1[t] + µ(Y − AZ[t+1] − E[t+1] ), Λ2[t+1] = Λ2[t] + µ(Z[t+1] − J[t+1] ), Λ3[t+1] = Λ3[t] + µ(J[t+1] − W[t+1] ). (15) Sparse and Low Rank Representations in Music Signal Analysis 32/54
  • 80. Special cases Robust joint SR (JSR) The solution of the convex optimization problem is sought: Robust JSR: argmin Z 1 + θ2 E 2,1 subject to Y = A Z + E. Z,E (16) (16) takes into account the correlations between the test samples, while seeking to jointly represent the test samples from a specific class by a few columns of the training matrix. L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E) 1 µ +tr ΛT (Z − J) + 2 Y − AZ − E 2 + Z − J 2 , F F (17) 2 where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty parameter. Sparse and Low Rank Representations in Music Signal Analysis 33/54
  • 81. Special cases Robust joint SR (JSR) The solution of the convex optimization problem is sought: Robust JSR: argmin Z 1 + θ2 E 2,1 subject to Y = A Z + E. Z,E (16) (16) takes into account the correlations between the test samples, while seeking to jointly represent the test samples from a specific class by a few columns of the training matrix. L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E) 1 µ +tr ΛT (Z − J) + 2 Y − AZ − E 2 + Z − J 2 , F F (17) 2 where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty parameter. Sparse and Low Rank Representations in Music Signal Analysis 33/54
  • 82. Special cases Robust joint SR (JSR) The solution of the convex optimization problem is sought: Robust JSR: argmin Z 1 + θ2 E 2,1 subject to Y = A Z + E. Z,E (16) (16) takes into account the correlations between the test samples, while seeking to jointly represent the test samples from a specific class by a few columns of the training matrix. L1 (Z, J, E, Λ1 , Λ2 ) = J 1 + θ2 E 2,1 + tr ΛT (Y − AZ − E) 1 µ +tr ΛT (Z − J) + 2 Y − AZ − E 2 + Z − J 2 , F F (17) 2 where Λ1 , Λ2 are Lagrange multipliers and µ > 0 is a penalty parameter. Sparse and Low Rank Representations in Music Signal Analysis 33/54
  • 83. Special cases Robust LRR The solution of the convex optimization problem is sought: Robust LRR: argmin Z ∗ + θ2 E 2,1 subject to Y = A Z + E. Z,E (18) by minimizing an augmented Lagrangian function similar to (17) where the first term J 1 is replaced by J ∗ . Sparse and Low Rank Representations in Music Signal Analysis 34/54
  • 84. Special cases Robust LRR The solution of the convex optimization problem is sought: Robust LRR: argmin Z ∗ + θ2 E 2,1 subject to Y = A Z + E. Z,E (18) by minimizing an augmented Lagrangian function similar to (17) where the first term J 1 is replaced by J ∗ . Sparse and Low Rank Representations in Music Signal Analysis 34/54
  • 85. Example 2 Noisy case 4 linear pairwise independent subspaces are constructed as in the Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are obtained. We pick randomly 50 columns of A and we replace them by a linear combination of randomly chosen vectors from two subspaces with random weights. Thus, the training set is now contaminated by outliers. The 5th column of the test matrix Y is replaced by a linear combination of vectors not drawn from any of the 4 subspaces and the 15th column of Y is replaced by a vector drawn from the 1st and the 4th subspace, as previously said. Sparse and Low Rank Representations in Music Signal Analysis 35/54
  • 86. Example 2 Noisy case 4 linear pairwise independent subspaces are constructed as in the Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are obtained. We pick randomly 50 columns of A and we replace them by a linear combination of randomly chosen vectors from two subspaces with random weights. Thus, the training set is now contaminated by outliers. The 5th column of the test matrix Y is replaced by a linear combination of vectors not drawn from any of the 4 subspaces and the 15th column of Y is replaced by a vector drawn from the 1st and the 4th subspace, as previously said. Sparse and Low Rank Representations in Music Signal Analysis 35/54
  • 87. Example 2 Noisy case 4 linear pairwise independent subspaces are constructed as in the Example 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 are obtained. We pick randomly 50 columns of A and we replace them by a linear combination of randomly chosen vectors from two subspaces with random weights. Thus, the training set is now contaminated by outliers. The 5th column of the test matrix Y is replaced by a linear combination of vectors not drawn from any of the 4 subspaces and the 15th column of Y is replaced by a vector drawn from the 1st and the 4th subspace, as previously said. Sparse and Low Rank Representations in Music Signal Analysis 35/54
  • 88. Example 2 Representation matrices (zoom in the 5th and 15th test samples) Sparse and Low Rank Representations in Music Signal Analysis 36/54
  • 89. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 37/54
  • 90. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 91. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 92. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 93. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 94. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 95. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 96. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 97. Joint sparse low-rank subspace-based classification Algorithm Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M . Output: A class label for each column of Y. 1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M . 2 for m = 1 to M 3 ¯ ym = ym − em . 4 for k = 1 to K 5 ¯ ¯ Compute the residuals rk (ym ) = ym − A δk (zm ) 2 . 6 end for 7 ¯ ¯ class(ym ) = argmink rk (ym ). 8 end for Sparse and Low Rank Representations in Music Signal Analysis 38/54
  • 98. Joint sparse low-rank subspace-based classification Linearity concentration index The LCI of a coefficient vector zm ∈ RN associated to the mth test sample is defined as K · maxk δk (zm ) 2 / zm 2 −1 LCI(zm ) = ∈ [0, 1]. (19) K −1 If LCI(zm ) = 1, the test sample is drawn from a single subspace. If LCI(zm ) = 0 the test sample is drawn evenly from all subspaces. By choosing a threshold c ∈ (0, 1), the mth test sample is claimed to be valid if LCI(zm ) > c. Otherwise, the test sample can be either rejected as totally invalid (for very small values of LCI(zm )) or be classified into multiple classes by assigning to it the labels associated to the larger values δk (zm ) 2 . Sparse and Low Rank Representations in Music Signal Analysis 39/54
  • 99. Joint sparse low-rank subspace-based classification Linearity concentration index The LCI of a coefficient vector zm ∈ RN associated to the mth test sample is defined as K · maxk δk (zm ) 2 / zm 2 −1 LCI(zm ) = ∈ [0, 1]. (19) K −1 If LCI(zm ) = 1, the test sample is drawn from a single subspace. If LCI(zm ) = 0 the test sample is drawn evenly from all subspaces. By choosing a threshold c ∈ (0, 1), the mth test sample is claimed to be valid if LCI(zm ) > c. Otherwise, the test sample can be either rejected as totally invalid (for very small values of LCI(zm )) or be classified into multiple classes by assigning to it the labels associated to the larger values δk (zm ) 2 . Sparse and Low Rank Representations in Music Signal Analysis 39/54
  • 100. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 40/54
  • 101. Music genre classification: Datasets and evaluation procedure GTZAN dataset 1000 audio recordings 30 seconds longa ; 10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz, Metal, Pop, Reggae, and Rock; Each genre class contains 100 audio recordings. The recordings are converted to monaural wave format at 16 kHz sampling rate with 16 bits and normalized, so that they have zero mean amplitude with unit variance. a G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, July 2002. Sparse and Low Rank Representations in Music Signal Analysis 41/54
  • 102. Music genre classification: Datasets and evaluation procedure ISMIR 2004 Genre dataset 1458 full audio recordings; 6 genre classes: Classical (640), Electronic (229), Jazz Blues(52), MetalPunk(90), RockPop(203), World (244). Sparse and Low Rank Representations in Music Signal Analysis 41/54
  • 103. Music genre classification: Datasets and evaluation procedure Protocols GTZAN dataset: stratified 10-fold cross-validation: Each training set consists of 900 audio recordings yielding a training matrix AGTZAN . ISMIR 2004 Genre dataset: The ISMIR2004 Audio Description Contest protocol defines training and evaluation sets, which consist of 729 audio files each. Sparse and Low Rank Representations in Music Signal Analysis 41/54
  • 104. Music genre classification: Datasets and evaluation procedure Classifiers JSLRSC, the JSSC, and the LRSC; SRCa with the coefficients estimated by the LASSOb ; linear regression classifier (LRC)c ; the SVM with a linear kernel, and the NN classifier with the cosine similarity. a J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009. b R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J. Royal. Statist. Soc B., vol. 58, no. 1, pp. 267-288, 1996. c I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106-2112, 2010. Sparse and Low Rank Representations in Music Signal Analysis 41/54
  • 105. Music genre classification Parameters θ1 > 0 and θ2 > 0 Sparse and Low Rank Representations in Music Signal Analysis 42/54
  • 106. Music genre classification Classification accuracy Dataset: GTZAN ISMIR 2004 Genre Classifier Parameters Accuracy (%) Parameters Accuracy (%) JSLRSC tuning 90.40 (3.06) tuning 88.75 JSSC tuning 88.80 (3.22) tuning 87.51 LRSC tuning 88.70 (2.79) tuning 85.18 JSLRSC θ1 = 0.2, 88.30 (2.21) θ1 = 0.5, 86.18 θ2 = 0.5, θ2 = 0.2, ρ = 1.1 ρ = 1.4 JSSC θ2 = 0.5, 88.00 (3.26) θ2 = 0.5, 87.51 ρ = 1.2 ρ = 1.2 LRSC θ2 = 0.2, 87.70 (2.54) θ2 = 0.2, 84.63 ρ = 1.4 ρ = 1.4 SRC - 86.50 (2.46) - 83.95 LRC - 87.30 (3.05) - 62.41 SVM - 86.20 (2.52) - 83.25 NN - 81.30 (2.79) - 79.42 Sparse and Low Rank Representations in Music Signal Analysis 43/54
  • 107. Music genre classification Comparison with the state-of-the-art Dataset: GTZAN ISMIR 2004 Genre Rank Reference Accuracy (%) Reference Accuracy (%) 1) Chang et al.a 92.70 Lee et al.b 86.83 2) Lee et al. 90.60 Holzapfel et al.c 83.50 3) Panagakis et al.d 84.30 Panagakis et al. 83.15 4) Bergstra et al.e 82.50 Pampalk et al. 82.30 5) Tsunoo et al.f 77.20 a K. Chang, J. S. R. Jang, and C. S. Iliopoulos, “Music genre classification via compressive sampling,” in Proc. 11th Int. Symp. Music Information Retrieval, pp. 387-392, 2010. b C. H. Lee, J. L. Shih, K. M. Yu, and H. S. Lin, ”Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, 2009. c A. Holzapfel and Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 424-434, February 2008. d Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Technology, vol. 18, no. 3, pp. 576-588, 2010. e J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Aggregate features and AdaBoost for music classification,” Machine Learning, vol. 65, no. 2-3, pp. 473–484, 2006. f E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama, “Beyond timbral statistics: Improving music classification using percussive patterns and bass lines,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 1003-1014, 2011. Sparse and Low Rank Representations in Music Signal Analysis 44/54
  • 108. Music genre classification Confusion matrices Sparse and Low Rank Representations in Music Signal Analysis 45/54
  • 109. Music genre classification Dimensionality reduction via random projections Let the true low dimensionality of the data be denoted by r . A random projection matrix, drawn from a normal zero-mean distribution, provides with high probability a stable embeddinga with the dimensionality of the projection d selected as the minimum value such that d > 2r log(7680/d ). r is estimated by the robust principal component analysis on a training set for each dataset. d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR 2004 Genre dataset is found. a R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010. Sparse and Low Rank Representations in Music Signal Analysis 46/54
  • 110. Music genre classification Dimensionality reduction via random projections Let the true low dimensionality of the data be denoted by r . A random projection matrix, drawn from a normal zero-mean distribution, provides with high probability a stable embeddinga with the dimensionality of the projection d selected as the minimum value such that d > 2r log(7680/d ). r is estimated by the robust principal component analysis on a training set for each dataset. d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR 2004 Genre dataset is found. a R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010. Sparse and Low Rank Representations in Music Signal Analysis 46/54
  • 111. Music genre classification Dimensionality reduction via random projections Let the true low dimensionality of the data be denoted by r . A random projection matrix, drawn from a normal zero-mean distribution, provides with high probability a stable embeddinga with the dimensionality of the projection d selected as the minimum value such that d > 2r log(7680/d ). r is estimated by the robust principal component analysis on a training set for each dataset. d = 1581 for the GTZAN dataset and d = 1398 for the ISMIR 2004 Genre dataset is found. a R.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010. Sparse and Low Rank Representations in Music Signal Analysis 46/54
  • 112. Music genre classification Accuracy after dimensionality reduction Dataset: GTZAN ISMIR 2004 Genre Classifier Parameters Accuracy (%) Parameters Accuracy (%) JSLRSC θ1 = 1.8, 87.5 (2.41) θ1 = 1.5, 85.87 θ2 = 0.7 , θ2 = 0.2, ρ = 1.4 ρ = 1.4 JSSC θ2 = 1.5 , 86.9 (3.28) θ2 = 0.6, 87.30 ρ = 1.2 ρ = 1.1 LRSC θ2 = 0.7 , 86.6 (2.75) θ2 = 0.8 , 84.08 ρ = 1.4 ρ = 2.4 SRC - 86.90 (3.21) - 83.67 LRC - 85.30 (3.16) - 54.18 SVM - 86.00 (2.53) - 83.26 NN - 80.80 (3.01) - 78.87 Sparse and Low Rank Representations in Music Signal Analysis 47/54
  • 113. Music genre classification Accuracy after rejecting 1 out of 5 test samples JLRSC achieves classification accuracy 95.51% in GTZAN dataset. For the ISMIR 2004 Genre dataset, the accuracy of the JSSC is 92.63%, while that of the JLRSC is 91.55%. 96 92 94 Classification accuracy (%) Classification accuracy (%) 92 90 90 88 88 86 86 JSLRSC JSLRSC JSSC JSSC 84 LRSC LRSC 84 LRC LRC SRC 82 SRC 82 SVM SVM NN NN 80 80 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 Threshold c Threshold c Sparse and Low Rank Representations in Music Signal Analysis 48/54
  • 114. Music structure analysis Optimization problem Given a music recording of K music segments be represented by a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . . |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing 1 ˜ ˜˜ Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a nonnegative symmetric affinity matrix W ∈ RN×N with elements + wij = mij and apply the normalized cutsa . 2 a J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000. Sparse and Low Rank Representations in Music Signal Analysis 49/54
  • 115. Music structure analysis Optimization problem Given a music recording of K music segments be represented by a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . . |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing λ2 2 argmin λ1 Z 1 + Z F subject to X = X Z, zii = 0 Z 2 1 ˜ ˜˜ Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a N×N nonnegative symmetric affinity matrix W ∈ R+ with elements wij = mij and apply the normalized cutsa . 2 a J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000. Sparse and Low Rank Representations in Music Signal Analysis 49/54
  • 116. Music structure analysis Optimization problem Given a music recording of K music segments be represented by a sequence of beat-synchronous feature vectors X = [x1 |x2 | . . . |xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing λ2 2 argmin λ1 Z 1 + Z F +λ3 E 1 subject to X = X Z + E, zii = 0. Z,E 2 1 ˜ ˜˜ Let Z = U Σ VT . Define U = U Σ 2 . Set M = UUT . Build a N×N nonnegative symmetric affinity matrix W ∈ R+ with elements wij = mij and apply the normalized cutsa . 2 a J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000. Sparse and Low Rank Representations in Music Signal Analysis 49/54
  • 117. Music tagging Optimization problem Assume that the tag-recording matrix Y and the matrix of the ATM representations X are jointly low-rank. Learn a low-rank weight matrix W such that: argmin W ∗ +λ E 1 subject to Y = W X + E. (20) W,E Sparse and Low Rank Representations in Music Signal Analysis 50/54
  • 118. Example 3 Sparse and Low Rank Representations in Music Signal Analysis 51/54
  • 119. 1 Introduction 2 Auditory spectro-temporal modulations 3 Suitable data representations for classification 4 Joint sparse low-rank representations in the ideal case 5 Joint sparse low-rank representations in the presence of noise 6 Joint sparse low-rank subspace-based classification 7 Music signal analysis 8 Conclusions Sparse and Low Rank Representations in Music Signal Analysis 52/54
  • 120. Conclusions Summary-Future Work A robust framework for solving classification and clustering problems in music signal analysis has been developed. In all the three problems addressed, the proposed techniques achieve either top performance or meet the state-of-the-art. Efficient implementations exploiting incremental update rules are desparately needed. Performance improvement for small sample sets deserves further elaboration. Sparse and Low Rank Representations in Music Signal Analysis 53/54