SlideShare uma empresa Scribd logo
1 de 20
Donglin Niu, Jennifer G. Dy
Department of Electrical and Computer Engineering, Northeastern University, Boston, MA
                                                                        Michael I. Jordan
                      EECS and Statistics Departments, University of California, Berkeley
   Given medical data,

     From doctor’s view:
            according to type of disease
     From insurance company view:
            based on patient’s cost/risk
Two kinds of Approaches:      Iterative & Simultaneous
Iterative
  Given an existing clustering, find another
  clustering
 Conditional Information Bottleneck. Gondek and
  Hofmann (2004)
 COALA. Bae and Bailey (2006)

 Minimizing KL-divergence. Qi and Davidson (2009)


    Multiple alternative clusterings
   Orthogonal Projection. Cui et al. (2007)
Simultaneous
Discovery of all the possible partitionings
 Meta Clustering. Caruana et al. (2006)
 De-correlated kmeans. Jain et al. (2008)
   Ensemble Clustering

   Hierarchical Clustering
VIEW 1        VIEW 2




There are O( KN ) possible clustering solutions.
We’d like to find solutions that:
   1. have high cluster quality, and
   2. be non-redundant
     and we’d like to simultaneously
   3. learn the subspace in each view
   Normalized Cut
    (On Spectral Clustering, Ng et al.)
    -maximize within-cluster similarity and minimize
     between-cluster similarity.

    Let U be the cluster assignment
                      T     1/ 2    1/ 2
          max tr(U D                      KD   U)
                                     T
          s.t.                  U U        I
Advantage: Can discover arbitrarily-shaped clusters.
   There are several possible criteria:
       Correlation, Mutual information.

    Correlation: can capture only linear dependencies.

    Mutual information: can capture non-linear
    dependencies, but requires estimating the joint probability
    distribution.
   In this approach, we choose
    Hilbert-Schmidt Information Criterion
                                           2
                  HSIC (x, y)       c xy
                                           HS
    Advantage: Can detect non-linear dependence, do not need
    to estimate joint probability distributions.
   HSIC is the norm of a cross-covariance matrix
     in kernel space.
                                 2
     HSIC (x, y)          c xy
                                 HS
               C xy     E xy [( ( x)       x   )       ( ( y)      y   )]
    Empirical estimate of HSIC
                1                      s.t.
HSIC( X , Y ) : 2 tr (KHLH )
                n                      H, K, L R n n ,
                                       K ij : k ( xi , x j ), L ij : l ( yi , y j )
                      Number of
                      observations                     1 T
                                       H           I     1n1n
                                                       n
                                                                Kernel functions
Cluster Quality: NormalizedCut
                          
                                                                    Redundancy HSIC
                                                                         : 
                                                                    
                            T      1/ 2         1/ 2
maximize    Uv   Rn c
                      tr(U v Dv K v Dv U v )                  v   q
                                                                    tr( K v HK q H )
                       T               T
s.t.               Uv Uv        I , Wv Wv       I , K v ,ij   K (WvT xi ,WvT x j )

       Where Uv is the embedding,
             Kv is the kernel matrix,
             Dv is the degree matrix for each view v.
             Hv is the matrix to centralize the kernel matrix.
             All these are defined in subspace Wv.
We use a coordinate ascent approach.
Step 1: Fixed Wv, optimize for Uv
   Solution to Uv is equal to the eigenvectors with the
    largest eigenvalues of the normalized kernel
    similarity matrix.

Step 2: Fixed Uv, optimize for Wv
   We use gradient ascent on a Stiefel manifold.

Repeat Steps 1 & 2 until convergence.
K-means Step:
   Normalize Uv. Apply k-means on Uv.
   Cluster the features using spectral clustering.
   Data x = [f1 f2 f3 f4 f5 …fd]
   Feature similarity based on HSIC(fi,fj).

                                           Transformation Matrix
       f1       f2
                     …                                Wv
               f4                                     1    0   0 . .
                                                      0 1      0 . .

    f15     f34                      f21              0 0      0 . .
             …               f3       …
          f7                      f9
                                                      0 0 1 . .
                                                      . . 0 . .
Synthetic Data 1                     Synthetic Data 2
      View 1         View 2                View 1        View 2




mSC: our algorithm                               DATA 1                DATA 2
OPC: orthogonal Projection               VIEW 1           VIEW 2   VIEW 1       VIEW 2
         (Cui et al., 2007)    mSC        0.94             0.95     0.90         0.93
DK:    de-correlated Kmeans    OPC        0.89             0.85     0.02         0.07
         (Jain et al., 2008)   DK         0.87             0.94     0.03         0.05
SC:    spectral clustering     SC         0.37             0.42     0.31         0.25
                               Kmeans     0.36             0.34     0.03         0.05

                               Normalized Mutual Information (NMI) Results
Identity (ID)View     Pose View       NMI Results
                                                  FACE
                                             ID     POSE
                                     mSC    0.79    0.42
                                     OPC    0.67    0.37
                                     DK     0.70    0.40
                                     SC     0.67    0.22
                                     Kmeans 0.64    0.24



    •Mean face
    •Number below each image is cluster purity
Webkb Data High Weight Words

     High weight word in each subspace view
view 1    Cornell, Texas, Wisconsin, Madison, Washington

view 2    homework, student, professor, project, Ph.d


         NMI                          Webkb
                                  Univ.    Type
         Results      mSC         0.81     0.54
                      OPC         0.43     0.53
                      DK          0.48     0.57
                      SC          0.25     0.39
                      Kmeans      0.10     0.50
NSF Award Data High Frequent Words

            Subjects                        Work Type
Physics     Information   Biology      experimental      theoretical
materials   control       cell         methods          Experiments
chemical    programming   gene         mathematical     Processes
metal       information   protein      develop          Techniques
optical     function      DNA          equation         Measurements
quantum     languages     Biological   theoretical      surface
Machine Sound Data

                      Machine Sound Data
                   Motor       Fan      Pump
     mSC            0.82       0.75     0.83
     OPC            0.73       0.68     0.47
     DK             0.64       0.58     0.75
     SC             0.42       0.16     0.09
     Kmeans         0.57       0.16     0.09

     Normalized Mutual Information (NMI) Results
   Most clustering algorithms only find one single
    clustering solution. However, data may be multi-
    faceted (i.e., it can be interpreted in many different
    ways).
   We introduced a new method for discovering
    multiple non-redundant clusterings.

   Our approach, mSC, optimizes both a spectral
    clustering (to measure quality) and an HSIC
    regularization (to measure redundancy).
   mSC, can discover multiple clusters with flexible
    shapes, while simultaneously find the subspace in
    which these clustering views reside.
Thank you!

Mais conteúdo relacionado

Destaque

Literacy inquiry project
Literacy inquiry projectLiteracy inquiry project
Literacy inquiry projectKhalladay13
 
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...SMART Seminar Series: "A spatial microsimulation model to forecast health nee...
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...SMART Infrastructure Facility
 
Corporate presentatie SPS
Corporate presentatie SPSCorporate presentatie SPS
Corporate presentatie SPSfrankvdhoek
 
Unlimited Joy (Zephaniah 3.17) by Harry Zeiders
Unlimited Joy (Zephaniah 3.17) by Harry ZeidersUnlimited Joy (Zephaniah 3.17) by Harry Zeiders
Unlimited Joy (Zephaniah 3.17) by Harry ZeidersHarryKZeiders
 
9707 w14 qp_32 (1)
9707 w14 qp_32 (1)9707 w14 qp_32 (1)
9707 w14 qp_32 (1)Saadia Riaz
 
Second Half and Full year 2013 Asia Pacific ISG Outsourcing Index
Second Half and Full year 2013 Asia Pacific ISG Outsourcing IndexSecond Half and Full year 2013 Asia Pacific ISG Outsourcing Index
Second Half and Full year 2013 Asia Pacific ISG Outsourcing IndexInformation Services Group (ISG)
 

Destaque (11)

Folhetoeq1
Folhetoeq1Folhetoeq1
Folhetoeq1
 
Unek
UnekUnek
Unek
 
Literacy inquiry project
Literacy inquiry projectLiteracy inquiry project
Literacy inquiry project
 
mfe_our services
mfe_our servicesmfe_our services
mfe_our services
 
Kyurk story
Kyurk storyKyurk story
Kyurk story
 
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...SMART Seminar Series: "A spatial microsimulation model to forecast health nee...
SMART Seminar Series: "A spatial microsimulation model to forecast health nee...
 
Corporate presentatie SPS
Corporate presentatie SPSCorporate presentatie SPS
Corporate presentatie SPS
 
Unlimited Joy (Zephaniah 3.17) by Harry Zeiders
Unlimited Joy (Zephaniah 3.17) by Harry ZeidersUnlimited Joy (Zephaniah 3.17) by Harry Zeiders
Unlimited Joy (Zephaniah 3.17) by Harry Zeiders
 
Sur de Dublin
Sur de DublinSur de Dublin
Sur de Dublin
 
9707 w14 qp_32 (1)
9707 w14 qp_32 (1)9707 w14 qp_32 (1)
9707 w14 qp_32 (1)
 
Second Half and Full year 2013 Asia Pacific ISG Outsourcing Index
Second Half and Full year 2013 Asia Pacific ISG Outsourcing IndexSecond Half and Full year 2013 Asia Pacific ISG Outsourcing Index
Second Half and Full year 2013 Asia Pacific ISG Outsourcing Index
 

Semelhante a 2010 ICML

Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...grssieee
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defensejunkermeier
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"..."Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...Davide Chicco
 
[KHBM] Application of network analysis based on cortical thickness to obsessi...
[KHBM] Application of network analysis based on cortical thickness to obsessi...[KHBM] Application of network analysis based on cortical thickness to obsessi...
[KHBM] Application of network analysis based on cortical thickness to obsessi...Seung-Goo Kim
 
Pres110811
Pres110811Pres110811
Pres110811shotlub
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdfAhmadM65
 
Neural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and ProspectsNeural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and ProspectsSSA KPI
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysisrik0
 
Existence of Hopf-Bifurcations on the Nonlinear FKN Model
Existence of Hopf-Bifurcations on the Nonlinear FKN ModelExistence of Hopf-Bifurcations on the Nonlinear FKN Model
Existence of Hopf-Bifurcations on the Nonlinear FKN ModelIJMER
 
2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004nozomuhamada
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysislalitxp
 
extreme times in finance heston model.ppt
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.pptArounaGanou2
 

Semelhante a 2010 ICML (20)

Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"..."Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...
"Genome-Wide Annotation Prediction with SVD Truncation based on ROC Analysis"...
 
[KHBM] Application of network analysis based on cortical thickness to obsessi...
[KHBM] Application of network analysis based on cortical thickness to obsessi...[KHBM] Application of network analysis based on cortical thickness to obsessi...
[KHBM] Application of network analysis based on cortical thickness to obsessi...
 
Pres110811
Pres110811Pres110811
Pres110811
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Neural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and ProspectsNeural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and Prospects
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Existence of Hopf-Bifurcations on the Nonlinear FKN Model
Existence of Hopf-Bifurcations on the Nonlinear FKN ModelExistence of Hopf-Bifurcations on the Nonlinear FKN Model
Existence of Hopf-Bifurcations on the Nonlinear FKN Model
 
2012 mdsp pr02 1004
2012 mdsp pr02 10042012 mdsp pr02 1004
2012 mdsp pr02 1004
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
extreme times in finance heston model.ppt
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.ppt
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

2010 ICML

  • 1. Donglin Niu, Jennifer G. Dy Department of Electrical and Computer Engineering, Northeastern University, Boston, MA Michael I. Jordan EECS and Statistics Departments, University of California, Berkeley
  • 2.
  • 3. Given medical data, From doctor’s view: according to type of disease From insurance company view: based on patient’s cost/risk
  • 4. Two kinds of Approaches: Iterative & Simultaneous Iterative Given an existing clustering, find another clustering  Conditional Information Bottleneck. Gondek and Hofmann (2004)  COALA. Bae and Bailey (2006)  Minimizing KL-divergence. Qi and Davidson (2009) Multiple alternative clusterings  Orthogonal Projection. Cui et al. (2007)
  • 5. Simultaneous Discovery of all the possible partitionings  Meta Clustering. Caruana et al. (2006)  De-correlated kmeans. Jain et al. (2008)
  • 6. Ensemble Clustering  Hierarchical Clustering
  • 7. VIEW 1 VIEW 2 There are O( KN ) possible clustering solutions. We’d like to find solutions that: 1. have high cluster quality, and 2. be non-redundant and we’d like to simultaneously 3. learn the subspace in each view
  • 8. Normalized Cut (On Spectral Clustering, Ng et al.) -maximize within-cluster similarity and minimize between-cluster similarity. Let U be the cluster assignment T 1/ 2 1/ 2 max tr(U D KD U) T s.t. U U I Advantage: Can discover arbitrarily-shaped clusters.
  • 9. There are several possible criteria: Correlation, Mutual information. Correlation: can capture only linear dependencies. Mutual information: can capture non-linear dependencies, but requires estimating the joint probability distribution.  In this approach, we choose Hilbert-Schmidt Information Criterion 2 HSIC (x, y) c xy HS Advantage: Can detect non-linear dependence, do not need to estimate joint probability distributions.
  • 10. HSIC is the norm of a cross-covariance matrix in kernel space. 2 HSIC (x, y) c xy HS C xy E xy [( ( x) x ) ( ( y) y )]  Empirical estimate of HSIC 1 s.t. HSIC( X , Y ) : 2 tr (KHLH ) n H, K, L R n n , K ij : k ( xi , x j ), L ij : l ( yi , y j ) Number of observations 1 T H I 1n1n n Kernel functions
  • 11. Cluster Quality: NormalizedCut    Redundancy HSIC  :   T 1/ 2 1/ 2 maximize Uv Rn c tr(U v Dv K v Dv U v ) v q tr( K v HK q H ) T T s.t. Uv Uv I , Wv Wv I , K v ,ij K (WvT xi ,WvT x j ) Where Uv is the embedding, Kv is the kernel matrix, Dv is the degree matrix for each view v. Hv is the matrix to centralize the kernel matrix. All these are defined in subspace Wv.
  • 12. We use a coordinate ascent approach. Step 1: Fixed Wv, optimize for Uv  Solution to Uv is equal to the eigenvectors with the largest eigenvalues of the normalized kernel similarity matrix. Step 2: Fixed Uv, optimize for Wv  We use gradient ascent on a Stiefel manifold. Repeat Steps 1 & 2 until convergence. K-means Step:  Normalize Uv. Apply k-means on Uv.
  • 13. Cluster the features using spectral clustering.  Data x = [f1 f2 f3 f4 f5 …fd]  Feature similarity based on HSIC(fi,fj). Transformation Matrix f1 f2 … Wv f4 1 0 0 . . 0 1 0 . . f15 f34 f21 0 0 0 . . … f3 … f7 f9 0 0 1 . . . . 0 . .
  • 14. Synthetic Data 1 Synthetic Data 2 View 1 View 2 View 1 View 2 mSC: our algorithm DATA 1 DATA 2 OPC: orthogonal Projection VIEW 1 VIEW 2 VIEW 1 VIEW 2 (Cui et al., 2007) mSC 0.94 0.95 0.90 0.93 DK: de-correlated Kmeans OPC 0.89 0.85 0.02 0.07 (Jain et al., 2008) DK 0.87 0.94 0.03 0.05 SC: spectral clustering SC 0.37 0.42 0.31 0.25 Kmeans 0.36 0.34 0.03 0.05 Normalized Mutual Information (NMI) Results
  • 15. Identity (ID)View Pose View NMI Results FACE ID POSE mSC 0.79 0.42 OPC 0.67 0.37 DK 0.70 0.40 SC 0.67 0.22 Kmeans 0.64 0.24 •Mean face •Number below each image is cluster purity
  • 16. Webkb Data High Weight Words High weight word in each subspace view view 1 Cornell, Texas, Wisconsin, Madison, Washington view 2 homework, student, professor, project, Ph.d NMI Webkb Univ. Type Results mSC 0.81 0.54 OPC 0.43 0.53 DK 0.48 0.57 SC 0.25 0.39 Kmeans 0.10 0.50
  • 17. NSF Award Data High Frequent Words Subjects Work Type Physics Information Biology experimental theoretical materials control cell methods Experiments chemical programming gene mathematical Processes metal information protein develop Techniques optical function DNA equation Measurements quantum languages Biological theoretical surface
  • 18. Machine Sound Data Machine Sound Data Motor Fan Pump mSC 0.82 0.75 0.83 OPC 0.73 0.68 0.47 DK 0.64 0.58 0.75 SC 0.42 0.16 0.09 Kmeans 0.57 0.16 0.09 Normalized Mutual Information (NMI) Results
  • 19. Most clustering algorithms only find one single clustering solution. However, data may be multi- faceted (i.e., it can be interpreted in many different ways).  We introduced a new method for discovering multiple non-redundant clusterings.  Our approach, mSC, optimizes both a spectral clustering (to measure quality) and an HSIC regularization (to measure redundancy).  mSC, can discover multiple clusters with flexible shapes, while simultaneously find the subspace in which these clustering views reside.

Notas do Editor

  1. Good afternoon. My name is DonglinNiu and I’m going to talk about “Multiple Non-Redundant Spectral Clustering Views.” This is work I did with my advisor, Jennifer Dy, from Northeastern University and with Mike Jordan form UC Berkeley.
  2. Clustering is often the first step in exploring data. Most clustering algorithms only find one clustering solution. However, data may be multi-faceted by nature (i.e., a single data can be interpreted in many different ways). For example, let’s say, are data is a bunch of web-pages as shown here. One way to cluster this data is by grouping faculty webpages together in one cluster and the student webpages into another cluster.Another way is to group them is according to the university they belong to.
  3. Another example is:Given medical data, A doctor may be interested in grouping the data based on disease type.An insurance company may be interested in grouping the patients according to their cost/risk.
  4. Because of the realization of the need for finding multiple alternative clustering interpretations, there is recent interest in this new clustering research paradigm.There are two kinds of approaches in solving this problem: Iterative and Simultaneous.In iterative methods,One is given an existing clustering, and the goal is to find an alternative clustering.Gondek and Hofman finds an alternative clustering using a conditional information bottleneck approach,Bae and Bailey applies must & cannot-link constraints and agglomerative clustering,Qi and Davidson minimizes a KL-divergence criterion.In many cases, one may be interested in finding not just one but multiple alternative clusterings. Cui et al. introduced an iterative orthogonal projection approach for finding multiple alternative clustering solutions.
  5. Another type of solution is simultaneously discovering all the possible partitionings.Meta Clustering by Caruana et al. generates several alternative solutions by random projection, then they apply hierarchical clustering of the clustering solutions.De-correlated Kmeans by Jain et al. minimizes both, the k-means sum-squared-error for each clustering solution and their correlation with each other, to find multiple cluster partitionings.Our approach is a simultaneous approach. However unlike meta-clustering which applies random projection, we find multiple alternative clusterings based on an objective function. Unlike de-correlated k-means which is based on k-means and thereby limited to find only spherical clusters, our approach can discover non-convex shaped clusters. Moreover, de-correlated k-means uses all the features in all the views; our approach, learns the subspace in each clustering view.
  6. The paradigm of finding multiple alternative clusterings is different from ensemble methods. Like this paradigm, ensemble clustering generate several alternative clusterings, but their ultimate goal is to find a SINGLE consensus clustering solution.Hierarchical clustering also generate several partitionings; however, they generate a hierarchy of coarse-to-fine clusters, such that samples that belong in the same cluster in the lower or fine levels of the hierarchy stay together at the higher or coarser levels. In our case, samples that belong to the same cluster in one view or solution can belong to different clusters in other views.
  7. Let’s say we have data in four dimensions. In features F1 and F2 it has a 3 ring cluster structure as shown in View 1, and a two half-moon cluster structure in features F3 and F4 in view 2. A standard clustering algorithm will have the dilemna of selecting which of these two structures is more interesting to discover. Instead of finding one of them, our goal is to find all possible interesting cluster structures/views. There are O(K^n) possible ways to cluster n samples into K groups modulo permutation of the clusters.We do not want to show these ways to the user as it will overwhelm the data analyst.We’d like to find solutions that:Have high cluster quality andWe’d like to provide non-redundant cluster views.Moreover, we’ve noticed that typically, the different alternative clusterings reside in different subspaces (i.e., they have utilize different similarity metrics to find these clusters).Thus, in our formulation, we also simultaneously learn the subspace in which the clusterings reside in each view.I’ll discuss each component in the following slides.
  8. We’d like to capture arbitrarily-shaped clusters. We employ the normalized-cut criterion and spectral clustering to define cluster quality.Normalized cut maximizes the within-cluster similarity and minimizes between-cluster similarity.Let U be the cluster assignment. In spectral clustering, we relax the cluster assignment U to take on any real value, then the normalize-cut clustering objective becomes maximizing the trace of U transposed the normalized similarity matrix U) subject to the constraint that U is orthonormal.The advantage of this criterion is that it can discover arbitrarily-shaped clusters.
  9. We’d like the clustering solutions we discover to be non-redundant with each other. There are several possible criteria for measuring non-redundancy: correlation or mutual information.(Read slide)
  10. HSIC is a norm of a cross-covariance matrix in kernel space.Empirically, we can estimate the HSIC between two random variables X and Y as theTrace of two kernel matrices K and L. H here simply centers the kernel matrices.
  11. Our overall objective is then to maximize this function.The first term optimizes for cluster quality, the spectral clustering criterion.The second term minimizes the redundancies among the clustering views.Lambda is the regularization parameter that controls the trade-off between these two criteria.We incorporate discovering the subspace in which the clustering solutions in each view reside by learning transformation matrix W_v. Note that W_v is inside the kernel and operates on the original input x.
  12. We optimize our objective to solve for the cluster embedding Uv and the subspace Wv in each view as follows.(Read slide)We discretize by applying a K-means step: (read slide)
  13. Our approach is only guaranteed to find local optima. Thus, the solution is dependent on initialization.We initialize the subspaces Wv in each view as follows.We cluster the features (i.e., columns of x) using spectral clustering and apply Hsic(f_i, f_j) between features as a measure of similarity. This groups together features that are dependent on each other into the same cluster and those that are independent from each other into different groups. Each feature group forms the transformation matrix Wv in each group as follows. (click through the animation and explain).Note that even though each view started with disjoint features, after running our algorithm to convergence, each feature will have some weight in all views. Note to that the dimensions in each view are set by the number of features in each view in our initialization.