SlideShare uma empresa Scribd logo
1 de 88
Baixar para ler offline
Summer School
“Achievements and Applications of Contemporary Informatics,
         Mathematics and Physics” (AACIMP 2011)
              August 8-20, 2011, Kiev, Ukraine




                    Cluster Analysis

                                 Erik Kropat

                     University of the Bundeswehr Munich
                      Institute for Theoretical Computer Science,
                        Mathematics and Operations Research
                                Neubiberg, Germany
The Knowledge Discovery
        Process
PATTERN
                                                              EVALUATION
                                                                                Knowledge
                                      DATA
                                     MINING
                                                      Patterns             Strategic planning

      PRE-
   PROCESSING      Preprocessed
                       Data             Patterns, clusters, correlations
                                        automated classification
Raw                                     outlier / anomaly detection
         Standardizing                  association rule learning…
Data
         Missing values / outliers
Clustering
Clustering
… is a tool for data analysis, which solves classification problems.


Problem
Given n observations, split them into K similar groups.




Question
How can we define “similarity” ?
Similarity
A cluster is a set of entities which are alike,
and entities from different clusters are not alike.
Distance
A cluster is an aggregation of points such that

the distance between      any two points in the cluster
is less than
the distance between      any point in the cluster and any point not in it.
Density
Clusters may be described as
connected regions of a multidimensional space
containing a relatively high density of points,
separated from other such regions by a region
containing a relatively low density of points.
Min Max-Problem
Homogeneity: Objects within the same cluster should be similar to each other.
Separation:     Objects in different clusters should be dissimilar from each other.

                                  Distance between
                                       clusters                   Distance between
                                                                       objects




                           similarity ⇔ distance
Types of Clustering


                                        Clustering



              Hierarchical                           Partitional
              Clustering                             Clustering



  agglomerative              divisive
Similarity and Distance
Distance Measures
A metric on a set G is a function d: G x G → R+ that satisfies the following
conditions:

(D1)    d(x, y) = 0    ⇔     x=y                              (identity)

(D2)    d(x, y) = d(y, x) ≥ 0        for all x, y ∈ G         (symmetry & non-negativity)

(D3)    d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ G       (triangle inequality)


                                    z

                                                          y

                                x
Examples
Minkowski-Distance
                                    1
                                    _
                n                   r
                                r
 d r (x, y) =   Σ | xi − yi |
                i=1
                                        , r ∈ [1, ∞) , x, y ∈ Rn.




    o r = 1: Manhatten distance
    o r = 2: Euklidian distance
Euclidean Distance
                                        1
                                        _
                n
                Σ
                                    2   2
  d2 (x, y) =         ( xi − yi )           , x, y ∈ Rn
                i=1




                                    y          x = (1, 1)
                                               y = (4, 3)
                                                                                   _
                                                                                   1
                                                                                         ____
                                                                   2             2 2
                                               d2 (x, y) =   (1 - 4) + (1 - 3)         = √13
           x
Manhatten Distance

                n
  d1 (x, y) =   Σ     | xi − yi | , x, y ∈ Rn
                i=1




                                   y            x = (1, 1)
                                                y = (4, 3)

                                                d1 (x, y) = | 1 - 4 | + | 1 - 3 | = 3 + 2 = 5
           x
Maximum Distance

  d∞ (x, y) =   max     | xi − yi | , x, y ∈ Rn
                1≤i≤n




                                y                 x = (1, 1)
                                                  y = (4, 3)

                                                  d∞ (x, y) = max (3, 2) = 3
            x
Similarity Measures
A similarity function on a set G is a function S: G x G → R that satisfies the
following conditions:

(S1)    S (x, y) ≥ 0                     for all x, y ∈ G   (positive defined)

(S2)    S (x, y) ≤ S (x, x)              for all x, y ∈ G   (auto-similarity)

(S3)    S (x, y) = S (x, x) ⇔   x=y      for all x, y ∈ G   (identity)



The value of the similarity function is greater when two points are closer.
Similarity Measures

•   There are many different definitions of similarity.

•   Often used


          (S4)   S (x, y) = S (y, x)   for all x, y ∈ G   (symmetry)
Hierachical Clustering
Dendrogram


                                                                     Cluster Dendrogram
                      Euclidean distance (complete linkage)
      Euclidean distance
                            (complete linkage)




                                   Gross national product of EU countries – agriculture (1993)
                                         www.isa.uni-stuttgart.de/lehre/SAHBD
Hierarchical Clustering
Hierarchical clustering creates a hierarchy of clusters of the set G.

                              Hierarchical
                              Clustering



                 agglomerative               divisive



Agglomerative clustering:        Clusters are successively merged together
Divisive clustering:             Clusters are recursively split
Agglomerative Clustering
Merge clusters with smallest distance between the two clusters



   Step 3                         e1, e2 , e3, e4                1 cluster



   Step 2                     e1, e2, e3            e4           2 clusters



   Step 1                e1, e2               e3         e4      3 clusters



   Step 0           e1            e2          e3         e4      4 clusters
Divisive Clustering
Chose a cluster, that optimally splits in two particular clusters
according to a given criterion.


  Step 0                           e1, e2 , e3, e4                  1 cluster



  Step 1                  e1, e2                     e3, e4         2 clusters



  Step 2                  e1, e2               e3             e4    3 clusters



  Step 3             e1            e2          e3             e4    4 clusters
Agglomerative
 Clustering
INPUT

Given              n objects G = { e1,...,en }
represented by     p-dimensional feature vectors                      x1,...,xn ∈ Rp




                                                                          Feature p
                                Feature 1

                                            Feature 2

                                                        Feature 3
                 Object


                   x1     = (    x11        x12         x13         ...    x1p )
                   x2     = (    x21        x22         x23         ...    x2p )

                   ⁞                ⁞          ⁞           ⁞                  ⁞
                   xn     = (    xn1        xn2         xn3         ...    xnp )
Example I

An online shop collects data from its customers.

        For each of the n customers it exists a p-dimensional feature vector



                        Object
Example II

In a clinical trial laboratory values of a large number of patients are gathered.

      For each of the n patients it exists a p-dimensional feature vector
Agglomerative Algorithms

• Begin       with disjoint clustering
              C1 = { {e1}, {e2}, ... , {en} }

• Terminate   when all objects are in one cluster

              Cn = { {e1, e2, ... , en} }                   e1   e2   e3   e4

• Iterate     find the most similar pair of clusters
              and merge them into a single cluster.


              Sequence of clusterings (Ci )i=1,...n of G with
                       Ci   ̶ 1   ⊂ Ci   for i = 2,...,n.
What is the distance between two clusters?




          A               d (A,B)            B




⇒ Various hierarchical clustering algorithms
Agglomerative Hierarchical Clustering

There exist many metrics to measure the distance between clusters.

They lead to particular agglomerative clustering methods:

•   Single-Linkage Clustering
•   Complete-Linkage Clustering
•   Average Linkage Clustering
•   Centroid Method
• ...
Single-Linkage Clustering


Nearest-Neighbor-Method
The distance between the clusters A und B is the
minimum distance between the elements of each cluster:

       d(A,B) = min { d (a, b) | a ∈ A, b ∈ B }



                  a     d(A,B)      b
Single-Linkage Clustering

• Advantage:   Can detect very long and even curved clusters.
               Can be used to detect outliers.

• Drawback:    Chaining phenomen
               Clusters that are very distant to each other
               may be forced together
               due to single elements being close to each other.
                                        B
                      C




                          A
Complete-Linkage Clustering


Furthest-Neighbor-Method
The distance between the clusters A and B is the
maximum distance between the elements of each cluster:
       d(A,B) = max { d(a,b) | a ∈ A, b ∈ B }



           a           d (A, B)           b
Complete-Linkage Clustering


• … tends to find compact clusters of approximately equal diameters.

• … avoids the chaining phenomen.

• … cannot be used for outlier detection.
Average-Linkage Clustering

The distance between the clusters A and B is the mean
distance between the elements of each cluster:
                         1
       d (A, B) =                 ⋅    Σ       d (a, b)
                      |A| ⋅ |B|       a ∈ A,
                                      b∈B



              a                                     b
                  A                             B


                            d(A,B)
Centroid Method

The distance between the clusters A and B is the
(squared) Euclidean distance of the cluster centroids.




                            d (A, B)
                 x                       x
Agglomerative Hierarchical Clustering



                                            d (A, B)
            d (A, B)




   a        d (A, B)
             d (A, B)     b                 d (A, B)
                                        x      (A, B)
                                                        x
Bioinformatics




                 Alizadeh et al., Nature 403 (2000): pp.503–511
Exercise




                   Berlin   Kiev

           Paris
                             Odessa
Exercise

The following table shows the distances between 4 cities:

                        Kiev         Odessa       Berlin       Paris
            Kiev                 ̶       440         1200         2000
            Odessa             440            ̶      1400         2100
            Berlin           1200       1400               ̶           900
            Paris            2000       2100           900               ̶


Determine a hierarchical clustering with
the single linkage method.
Solution - Single Linkage

Step 0:   Clustering
               {Kiev}, {Odessa}, {Berlin}, {Paris}


          Distances between clusters

                           Kiev         Odessa       Berlin       Paris
                Kiev                ̶       440         1200         2000
                Odessa            440            ̶      1400         2100
                Berlin         1200        1400               ̶           900
                Paris          2000        2100           900               ̶
Solution - Single Linkage

Step 0:   Clustering
               {Kiev}, {Odessa}, {Berlin}, {Paris}


          Distances between clusters                 minimal distance


                           Kiev         Odessa       Berlin       Paris
                Kiev                ̶       440         1200         2000
                Odessa            440            ̶      1400         2100
                Berlin         1200        1400               ̶           900
                Paris          2000        2100           900               ̶

          ⇒    Merge clusters { Kiev } and { Odessa }
               Distance value: 440
Solution - Single Linkage

Step 1:   Clustering
               {Kiev, Odessa}, {Berlin}, {Paris}


          Distances between clusters

                              Kiev, Odessa Berlin       Paris
               Kiev, Odessa              ̶   1200               2000
               Berlin                1200           ̶            900
               Paris                 2000     900                  ̶
Solution - Single Linkage

Step 1:   Clustering
               {Kiev, Odessa}, {Berlin}, {Paris}


          Distances between clusters
                                                                 minimal distance
                              Kiev, Odessa Berlin       Paris
               Kiev, Odessa              ̶   1200               2000
               Berlin                1200           ̶            900
               Paris                 2000     900                  ̶


          ⇒    Merge clusters { Berlin } and { Paris }
               Distance value: 900
Solution - Single Linkage

Step 2:   Clustering
               {Kiev, Odessa}, {Berlin, Paris}


          Distances between clusters
                                                            minimal distance
                               Kiev, Odessa Berlin, Paris
               Kiev, Odessa               ̶         1200
               Berlin, Paris          1200              ̶


          ⇒    Merge clusters { Kiev, Odessa } and { Berlin, Paris }
               Distance value: 1200
Solution - Single Linkage

Step 3:   Clustering
               {Kiev, Odessa, Berlin, Paris}
Solution - Single Linkage

Hierarchy


                  Distance
                  values
        2540                                            1 cluster

                  1200


        1340                                            2 clusters

                  900

            440                                         3 clusters
                  440
             0                                          4 clusters
                         Kiev   Odessa Berlin   Paris
Divisive Clustering
Divisive Algorithms

• Begin       with one cluster
              C1 = { {e1, e2, ... , en} }
                                                            e1   e2   e3   e4
• Terminate   when all objects are in disjoint clusters

              Cn = { {e1}, {e2}, ... , {en} }

• Iterate     Chose a cluster Cf , that optimally splits
              two particular clusters Ci and Cj
              according to a given criterion.


              Sequence of clusterings (Ci )i=1,...n of G with
                        C i ⊃ C i + 1 for i = 1,...,n-1.
Partitional Clustering
̶ Minimal Distance Methods ̶
Partitional Clustering                                K=2


• Aims to partition n observations into K clusters.

• The number of clusters and
                                                            initial partition
  an initial partition are given.

• The initial partition is considered as
  “not optimal“ and should be                         K=2

  iteratively repartitioned.


        The number of clusters is given !!!
                                                             final partition
Partitional Clustering

Difference to hierarchical clustering
• number of clusters is fixed.
• an object can change the cluster.

Initial partition is obtained by
• random     or
• the application of an hierarchical clustering algorithm in advance.

Estimation of the number of clusters
• specialized methods (e.g., Silhouette)     or
• the application of an hierarchical clustering algorithm in advance.
Partitional Clustering - Methods


In this course we will introduce the minimal distance methods . . .

• K-Means     and
• Fuzzy-C-Means
K-Means
K-Means

Aims to partition n observations into K clusters
in which each observation belongs to the cluster with the nearest mean.



                                                    G                          C3
Find K cluster centroids µ1 ,..., µK
that minimize the objective function

             K                                          C1
             Σ Σ
                               2
  J    =                   dist ( µi, x )
            i =1 x ∈ Ci                                                   C2
K-Means

Aims to partition n observations into K clusters
in which each observation belongs to the cluster with the nearest mean.



                                                    G                          C3
Find K cluster centroids µ1 ,..., µK
that minimize the objective function                                       x
                                                             x
             K                                          C1
             Σ Σ
                               2
  J    =                   dist ( µi, x )                             x
            i =1 x ∈ Ci                                                   C2
K-Means - Minimal Distance Method

Given: n objects, K clusters
1. Determine initial partition.
2. Calculate cluster centroids.                   x      x
3. For each object, calculate the distances
   to all cluster centroids.
                                                      repartition
4. If the distance
        to the centroid of another cluster
   is smaller than the distance
        to the actual cluster centroid,
   then assign the object to the other cluster.
5. If clusters are repartitioned: GOTO 2.
   ELSE: STOP.
Example




                 ₓ ₓ           ₓ                ₓ



          Initial Partition   Final Partition
Exercise




                                     ₓ
                  ₓ
                  ₓ                  ₓ

           Initial Partition   Final Partition
K-Means
• K-Means does not determine the global optimal partition.

• The final partition obtained by K-Means depends on the initial partition.
Hard Clustering / Soft Clustering



                                    Clustering




              Hard Clustering                           Soft Clustering



         Each object is a member                 Each object has a fractional
           of exactly one cluster                 membership in all clusters


                 K-Means                               Fuzzy-c-Means
Fuzzy-c-Means
Fuzzy Clustering vs. Hard Clustering

• When clusters are well separated,
  hard clustering (K-Means) makes sense.
• In many cases, clusters are not well separated.

            In hard clustering, borderline objects are assigned to
            a cluster in an arbitrary manner.
Fuzzy Set Theory

• Fuzzy Theory was introduced by Lofti Zadeh in 1965.

• An object can belong to a set with a degree of membership
  between 0 and 1.

• Classical set theory is a special case of fuzzy theory
  that restricts membership values to be either 0 or 1.
Fuzzy Clustering
• Is based on fuzzy logic and fuzzy set theory.
• Objects can belong to more than one cluster.
• Each object belongs to all clusters with some weight
  (degree of membership)

         1
                                                         Cluster 1
                                                         Cluster 2
                                                         Cluster 3

         0
Hard Clustering

• K-Means
 − The number K of clusters is given.
 − Each object is assigned to exactly one cluster.
                                                               Partition

                             Object                                        C3
   Cluster     e1       e2            e3    e4                  e3    e4

     C1        0         1            0     0
     C2        1         0            0     0             e2                    e1
                                                     C1                              C2
     C3        0         0            1     1
Fuzzy Clustering
•   Fuzzy-c-means
    − The number c of clusters is given.
    − Each object has a fractional membership in all clusters

                                   Object
        Cluster     e1        e2            e3    e4
          C1       0.8       0.2            0.1   0.0
                                                                Fuzzy-Clustering
          C2       0.2       0.2            0.2   0.0           There is no strict sub-division
                                                                of clusters.
          C3       0.0       0.6            0.7   1.0
           Σ        1         1             1     1
Fuzzy-c-Means

 •   Membership Matrix

         U = ( u i k ) ∈ [0, 1]c x n
     The entry u i k denotes the degree of membership of object k in cluster i .


                          Object 1      Object 2         …          Object n
          Cluster 1          u11           u12           …             u1n
          Cluster 2          u21           u22           …             u2n




                                                                     …
                          …
            …




                                        …




          Cluster c          uc1           uc2           …             ucn
Restrictions (Membership Matrix)

1. All weights for a given object, ek, must add up to 1.

         c
         Σ     u ik = 1          (k = 1,...,n)
        i =1


2. Each cluster contains – with non-zero weight – at least one object,
   but does not contain      – with a weight of one – all the objects.

                n
        0<      Σ     u ik < n   (i = 1,...,c)
               k =1
Fuzzy-c-Means

•   Vector of prototypes (cluster centroids)

                            T
        V = ( v1,...,vc ) ∈ Rc

Remark
The cluster centroids and the membership matrix are initialized randomly.
Afterwards they are iteratively optimized.
Fuzzy-c-Means

ALGORITHM

1. Select an initial fuzzy partition U = (u i k )
     ⇒ assign values to all u i k

2. Repeat
3.      Compute the centroid of each cluster using the fuzzy partition
4.      Update the fuzzy partition U = (u i k )
5. Until the centroids do not change.


Other stopping criterion: “change in the u i k is below a given threshold”.
v3
                                                                              u3k
Fuzzy-c-Means
                                                                         xk
• K-Means and Fuzzy-c-Means attempt to minimize                     v1 u1k          u2k
  the sum of the squared errors (SSE).
                                                                                      v2
• In K-Means:
                                K
                               Σ Σ
                                                           2
                      SSE =                            dist ( vi, x )
                              i =1 x ∈ Ci

• In Fuzzy-c-Means:
                                c    n        m
                      SSE =    Σ Σ          u ik   .       2
                                                       dist ( vi, xk )
                              i =1 k =1

  m ∈ [1, ∞] is a parameter (fuzzifier) that determines the influence of the weights.
v3
                                                                           u3k
Computing Cluster Centroids
                                                                      xk
• For each cluster i = 1,...,c the centroid is defined by        v1 u1k          u2k

                 n                                                                 v2
                 Σ     u
                           m
                               xk
                k = 1 ik
               _________________    ( i = 1,...,c )
       vi =
                  n       m
                 Σ     u ik                      (V)
               k=1

• This is an extension of the definition of centroids of k-means.

• All points are considered and the contribution of each point
  to the centroid is weighted by its membership degrees.
Update of the Fuzzy Partition (Membership Matrix)

• Minimization of SSE subject to the constraints leads to
 the following update formula:


                                   1
                 ______________________________________
        u ik =                                1
                                            _____
                   c          2                             (U)
                          dist ( v i , xk ) m – 1
                  Σ       __________
                               2
                 s=1      dist ( vs , xk )
Fuzzy-c-Means

 Initialization
 Determine (randomly)
 •   Matrix U of membership grades
 •   Matrix V of cluster centroids.


                  Iteration
                  Calculate updates of
                  •    Matrix U of membership grades with (U)
                   •   Matrix V of cluster centroids with (V)
                  until    cluster centroids are stable
                  or       the maximum number of iterations is reached.
Fuzzy-c-means

• Fuzzy-c-means depends on the Euclidian metric
  ⇒ spherical clusters.

• Other metrics can be applied to obtain different cluster shapes.

• Fuzzy covariance matrix (Gustafson/Kessel 1979)
  ⇒ ellipsoidal clusters.
Cluster Validity
    Indizes
Cluster Validity Indexes

Fuzzy-c-means requires the number of clusters as input.

Question:     How can we determine the “optimal” number of clusters?



Idea:        Determine the cluster partition for a given number of clusters.
             Then, evaluate the cluster partition by a cluster validity index.

Method:      For all possible number of clusters calculate the cluster validity index.
             Then, determine the optimal number of clusters.

Note:        CVIs usually do not depend on the clustering algorithm.
Cluster Validity Indexes

 •   Partition Coefficient (Bezdek 1981)

                           c    n
                   1                       2
          PC (c) = __ Σ         Σ     uik         ,     2 ≤ c ≤ n-1
                      n i=1 k=1


• Optimal number of clusters c∗ :


          PC (c∗) =       max                  PC (c)
                        2 ≤ c ≤ n-1
Cluster Validity Indexes

•   Partition Entropy (Bezdek 1974)

                             c        n
                     1
          PC (c) = _ __ Σ             Σ   u i k log2 u i k ,   2 ≤ c ≤ n-1
                         n i=1 k=1


• Optimal number of clusters c∗ :


         PC (c∗) =        min             PC (c)
                       2 ≤ c ≤ n-1


• Drawback of PC and PE:         Only degrees of memberships are considered.
                                 The geometry of the data set is neglected.
Cluster Validity Indexes

•   Fukuyama-Sugeno Index (Fukuyama/Sugeno 1989)

                       c     n       m
                       Σ     Σ
                                                2              Compactness
         FS (c) =                   uik     dist ( vi , xk )    of clusters
                      i=1 k=1
                       c     n
                 _                   m                 _
                       Σ     Σ                                  Separation
                                                2
                                    uik     dist ( vi , v )     of clusters
                      i=1 k=1


• Optimal number of clusters c∗ :
                                                                _   1 c
         PC (c∗) =     max                PC (c)                v = __ Σ vi
                     2 ≤ c ≤ n-1                                    c i =1
Application
Data Mining and Decision Support Systems ̶ Landslide Events
(UniBw, Geoinformatics Group: W. Reinhardt, E. Nuhn)

→     Spatial Data Mining / Early Warning Systems for Landslide Events
→     Fuzzy clustering approaches (feature weighting)




•   Measurements    (pressure values, tension, deformation vectors)
•   Simulations     (finite-element model)
Hard Clustering
      Data




                                                        Partition




 Problem: Uncertain data from measurements and simulations
Fuzzy Clustering
                      Data




      Fuzzy-Cluster          Fuzzy-Partition
Fuzzy Clustering
Feature Weighting




Nuhn/Kropat/Reinhardt/Pickl: Preparation of complex landslide simulation results with clustering approaches for decision support and early
warning. Submitted to Hawaii International Conference on System Sciences (HICCS 45), Grand Wailea, Maui, 2012.
Thank you very much!

Mais conteúdo relacionado

Mais procurados

PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubMartin Bago
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysissaba khan
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networksjoisino
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 

Mais procurados (20)

PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 

Semelhante a Cluster Analysis

Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031frdos
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentationEleni Stamatelou
 
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...Ioannis Partalas
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...mathsjournal
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 

Semelhante a Cluster Analysis (20)

Clustering
ClusteringClustering
Clustering
 
PR07.pdf
PR07.pdfPR07.pdf
PR07.pdf
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Clustering
ClusteringClustering
Clustering
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Lect4
Lect4Lect4
Lect4
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
QMC: Operator Splitting Workshop, Structured Decomposition of Multi-view Data...
QMC: Operator Splitting Workshop, Structured Decomposition of Multi-view Data...QMC: Operator Splitting Workshop, Structured Decomposition of Multi-view Data...
QMC: Operator Splitting Workshop, Structured Decomposition of Multi-view Data...
 
Cs345 cl
Cs345 clCs345 cl
Cs345 cl
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentation
 
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
Aggressive Sampling for Multi-class to Binary Reduction with Applications to ...
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...
Aggregation of Opinions for System Selection Using Approximations of Fuzzy Nu...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Multitask learning for GGM
Multitask learning for GGMMultitask learning for GGM
Multitask learning for GGM
 

Mais de SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentationSSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energySSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainabilitySSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering educationSSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginersSSA KPI
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011SSA KPI
 
Talking with money
Talking with moneyTalking with money
Talking with moneySSA KPI
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investmentSSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice gamesSSA KPI
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security CostsSSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5SSA KPI
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4SSA KPI
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3SSA KPI
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2SSA KPI
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1SSA KPI
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biologySSA KPI
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsSSA KPI
 

Mais de SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Último

An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsArubSultan
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...DrVipulVKapoor
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressMaria Paula Aroca
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 

Último (20)

An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristics
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian Congress
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 

Cluster Analysis

  • 1. Summer School “Achievements and Applications of Contemporary Informatics, Mathematics and Physics” (AACIMP 2011) August 8-20, 2011, Kiev, Ukraine Cluster Analysis Erik Kropat University of the Bundeswehr Munich Institute for Theoretical Computer Science, Mathematics and Operations Research Neubiberg, Germany
  • 3. PATTERN EVALUATION Knowledge DATA MINING Patterns Strategic planning PRE- PROCESSING Preprocessed Data Patterns, clusters, correlations automated classification Raw outlier / anomaly detection Standardizing association rule learning… Data Missing values / outliers
  • 5. Clustering … is a tool for data analysis, which solves classification problems. Problem Given n observations, split them into K similar groups. Question How can we define “similarity” ?
  • 6. Similarity A cluster is a set of entities which are alike, and entities from different clusters are not alike.
  • 7. Distance A cluster is an aggregation of points such that the distance between any two points in the cluster is less than the distance between any point in the cluster and any point not in it.
  • 8. Density Clusters may be described as connected regions of a multidimensional space containing a relatively high density of points, separated from other such regions by a region containing a relatively low density of points.
  • 9. Min Max-Problem Homogeneity: Objects within the same cluster should be similar to each other. Separation: Objects in different clusters should be dissimilar from each other. Distance between clusters Distance between objects similarity ⇔ distance
  • 10. Types of Clustering Clustering Hierarchical Partitional Clustering Clustering agglomerative divisive
  • 12. Distance Measures A metric on a set G is a function d: G x G → R+ that satisfies the following conditions: (D1) d(x, y) = 0 ⇔ x=y (identity) (D2) d(x, y) = d(y, x) ≥ 0 for all x, y ∈ G (symmetry & non-negativity) (D3) d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ G (triangle inequality) z y x
  • 13. Examples Minkowski-Distance 1 _ n r r d r (x, y) = Σ | xi − yi | i=1 , r ∈ [1, ∞) , x, y ∈ Rn. o r = 1: Manhatten distance o r = 2: Euklidian distance
  • 14. Euclidean Distance 1 _ n Σ 2 2 d2 (x, y) = ( xi − yi ) , x, y ∈ Rn i=1 y x = (1, 1) y = (4, 3) _ 1 ____ 2 2 2 d2 (x, y) = (1 - 4) + (1 - 3) = √13 x
  • 15. Manhatten Distance n d1 (x, y) = Σ | xi − yi | , x, y ∈ Rn i=1 y x = (1, 1) y = (4, 3) d1 (x, y) = | 1 - 4 | + | 1 - 3 | = 3 + 2 = 5 x
  • 16. Maximum Distance d∞ (x, y) = max | xi − yi | , x, y ∈ Rn 1≤i≤n y x = (1, 1) y = (4, 3) d∞ (x, y) = max (3, 2) = 3 x
  • 17. Similarity Measures A similarity function on a set G is a function S: G x G → R that satisfies the following conditions: (S1) S (x, y) ≥ 0 for all x, y ∈ G (positive defined) (S2) S (x, y) ≤ S (x, x) for all x, y ∈ G (auto-similarity) (S3) S (x, y) = S (x, x) ⇔ x=y for all x, y ∈ G (identity) The value of the similarity function is greater when two points are closer.
  • 18. Similarity Measures • There are many different definitions of similarity. • Often used (S4) S (x, y) = S (y, x) for all x, y ∈ G (symmetry)
  • 20. Dendrogram Cluster Dendrogram Euclidean distance (complete linkage) Euclidean distance (complete linkage) Gross national product of EU countries – agriculture (1993) www.isa.uni-stuttgart.de/lehre/SAHBD
  • 21. Hierarchical Clustering Hierarchical clustering creates a hierarchy of clusters of the set G. Hierarchical Clustering agglomerative divisive Agglomerative clustering: Clusters are successively merged together Divisive clustering: Clusters are recursively split
  • 22. Agglomerative Clustering Merge clusters with smallest distance between the two clusters Step 3 e1, e2 , e3, e4 1 cluster Step 2 e1, e2, e3 e4 2 clusters Step 1 e1, e2 e3 e4 3 clusters Step 0 e1 e2 e3 e4 4 clusters
  • 23. Divisive Clustering Chose a cluster, that optimally splits in two particular clusters according to a given criterion. Step 0 e1, e2 , e3, e4 1 cluster Step 1 e1, e2 e3, e4 2 clusters Step 2 e1, e2 e3 e4 3 clusters Step 3 e1 e2 e3 e4 4 clusters
  • 25. INPUT Given n objects G = { e1,...,en } represented by p-dimensional feature vectors x1,...,xn ∈ Rp Feature p Feature 1 Feature 2 Feature 3 Object x1 = ( x11 x12 x13 ... x1p ) x2 = ( x21 x22 x23 ... x2p ) ⁞ ⁞ ⁞ ⁞ ⁞ xn = ( xn1 xn2 xn3 ... xnp )
  • 26. Example I An online shop collects data from its customers. For each of the n customers it exists a p-dimensional feature vector Object
  • 27. Example II In a clinical trial laboratory values of a large number of patients are gathered. For each of the n patients it exists a p-dimensional feature vector
  • 28. Agglomerative Algorithms • Begin with disjoint clustering C1 = { {e1}, {e2}, ... , {en} } • Terminate when all objects are in one cluster Cn = { {e1, e2, ... , en} } e1 e2 e3 e4 • Iterate find the most similar pair of clusters and merge them into a single cluster. Sequence of clusterings (Ci )i=1,...n of G with Ci ̶ 1 ⊂ Ci for i = 2,...,n.
  • 29. What is the distance between two clusters? A d (A,B) B ⇒ Various hierarchical clustering algorithms
  • 30. Agglomerative Hierarchical Clustering There exist many metrics to measure the distance between clusters. They lead to particular agglomerative clustering methods: • Single-Linkage Clustering • Complete-Linkage Clustering • Average Linkage Clustering • Centroid Method • ...
  • 31. Single-Linkage Clustering Nearest-Neighbor-Method The distance between the clusters A und B is the minimum distance between the elements of each cluster: d(A,B) = min { d (a, b) | a ∈ A, b ∈ B } a d(A,B) b
  • 32. Single-Linkage Clustering • Advantage: Can detect very long and even curved clusters. Can be used to detect outliers. • Drawback: Chaining phenomen Clusters that are very distant to each other may be forced together due to single elements being close to each other. B C A
  • 33. Complete-Linkage Clustering Furthest-Neighbor-Method The distance between the clusters A and B is the maximum distance between the elements of each cluster: d(A,B) = max { d(a,b) | a ∈ A, b ∈ B } a d (A, B) b
  • 34. Complete-Linkage Clustering • … tends to find compact clusters of approximately equal diameters. • … avoids the chaining phenomen. • … cannot be used for outlier detection.
  • 35. Average-Linkage Clustering The distance between the clusters A and B is the mean distance between the elements of each cluster: 1 d (A, B) = ⋅ Σ d (a, b) |A| ⋅ |B| a ∈ A, b∈B a b A B d(A,B)
  • 36. Centroid Method The distance between the clusters A and B is the (squared) Euclidean distance of the cluster centroids. d (A, B) x x
  • 37. Agglomerative Hierarchical Clustering d (A, B) d (A, B) a d (A, B) d (A, B) b d (A, B) x (A, B) x
  • 38. Bioinformatics Alizadeh et al., Nature 403 (2000): pp.503–511
  • 39. Exercise Berlin Kiev Paris Odessa
  • 40. Exercise The following table shows the distances between 4 cities: Kiev Odessa Berlin Paris Kiev ̶ 440 1200 2000 Odessa 440 ̶ 1400 2100 Berlin 1200 1400 ̶ 900 Paris 2000 2100 900 ̶ Determine a hierarchical clustering with the single linkage method.
  • 41. Solution - Single Linkage Step 0: Clustering {Kiev}, {Odessa}, {Berlin}, {Paris} Distances between clusters Kiev Odessa Berlin Paris Kiev ̶ 440 1200 2000 Odessa 440 ̶ 1400 2100 Berlin 1200 1400 ̶ 900 Paris 2000 2100 900 ̶
  • 42. Solution - Single Linkage Step 0: Clustering {Kiev}, {Odessa}, {Berlin}, {Paris} Distances between clusters minimal distance Kiev Odessa Berlin Paris Kiev ̶ 440 1200 2000 Odessa 440 ̶ 1400 2100 Berlin 1200 1400 ̶ 900 Paris 2000 2100 900 ̶ ⇒ Merge clusters { Kiev } and { Odessa } Distance value: 440
  • 43. Solution - Single Linkage Step 1: Clustering {Kiev, Odessa}, {Berlin}, {Paris} Distances between clusters Kiev, Odessa Berlin Paris Kiev, Odessa ̶ 1200 2000 Berlin 1200 ̶ 900 Paris 2000 900 ̶
  • 44. Solution - Single Linkage Step 1: Clustering {Kiev, Odessa}, {Berlin}, {Paris} Distances between clusters minimal distance Kiev, Odessa Berlin Paris Kiev, Odessa ̶ 1200 2000 Berlin 1200 ̶ 900 Paris 2000 900 ̶ ⇒ Merge clusters { Berlin } and { Paris } Distance value: 900
  • 45. Solution - Single Linkage Step 2: Clustering {Kiev, Odessa}, {Berlin, Paris} Distances between clusters minimal distance Kiev, Odessa Berlin, Paris Kiev, Odessa ̶ 1200 Berlin, Paris 1200 ̶ ⇒ Merge clusters { Kiev, Odessa } and { Berlin, Paris } Distance value: 1200
  • 46. Solution - Single Linkage Step 3: Clustering {Kiev, Odessa, Berlin, Paris}
  • 47. Solution - Single Linkage Hierarchy Distance values 2540 1 cluster 1200 1340 2 clusters 900 440 3 clusters 440 0 4 clusters Kiev Odessa Berlin Paris
  • 49. Divisive Algorithms • Begin with one cluster C1 = { {e1, e2, ... , en} } e1 e2 e3 e4 • Terminate when all objects are in disjoint clusters Cn = { {e1}, {e2}, ... , {en} } • Iterate Chose a cluster Cf , that optimally splits two particular clusters Ci and Cj according to a given criterion. Sequence of clusterings (Ci )i=1,...n of G with C i ⊃ C i + 1 for i = 1,...,n-1.
  • 50. Partitional Clustering ̶ Minimal Distance Methods ̶
  • 51. Partitional Clustering K=2 • Aims to partition n observations into K clusters. • The number of clusters and initial partition an initial partition are given. • The initial partition is considered as “not optimal“ and should be K=2 iteratively repartitioned. The number of clusters is given !!! final partition
  • 52. Partitional Clustering Difference to hierarchical clustering • number of clusters is fixed. • an object can change the cluster. Initial partition is obtained by • random or • the application of an hierarchical clustering algorithm in advance. Estimation of the number of clusters • specialized methods (e.g., Silhouette) or • the application of an hierarchical clustering algorithm in advance.
  • 53. Partitional Clustering - Methods In this course we will introduce the minimal distance methods . . . • K-Means and • Fuzzy-C-Means
  • 55. K-Means Aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean. G C3 Find K cluster centroids µ1 ,..., µK that minimize the objective function K C1 Σ Σ 2 J = dist ( µi, x ) i =1 x ∈ Ci C2
  • 56. K-Means Aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean. G C3 Find K cluster centroids µ1 ,..., µK that minimize the objective function x x K C1 Σ Σ 2 J = dist ( µi, x ) x i =1 x ∈ Ci C2
  • 57. K-Means - Minimal Distance Method Given: n objects, K clusters 1. Determine initial partition. 2. Calculate cluster centroids. x x 3. For each object, calculate the distances to all cluster centroids. repartition 4. If the distance to the centroid of another cluster is smaller than the distance to the actual cluster centroid, then assign the object to the other cluster. 5. If clusters are repartitioned: GOTO 2. ELSE: STOP.
  • 58. Example ₓ ₓ ₓ ₓ Initial Partition Final Partition
  • 59. Exercise ₓ ₓ ₓ ₓ Initial Partition Final Partition
  • 60. K-Means • K-Means does not determine the global optimal partition. • The final partition obtained by K-Means depends on the initial partition.
  • 61. Hard Clustering / Soft Clustering Clustering Hard Clustering Soft Clustering Each object is a member Each object has a fractional of exactly one cluster membership in all clusters K-Means Fuzzy-c-Means
  • 63. Fuzzy Clustering vs. Hard Clustering • When clusters are well separated, hard clustering (K-Means) makes sense. • In many cases, clusters are not well separated. In hard clustering, borderline objects are assigned to a cluster in an arbitrary manner.
  • 64. Fuzzy Set Theory • Fuzzy Theory was introduced by Lofti Zadeh in 1965. • An object can belong to a set with a degree of membership between 0 and 1. • Classical set theory is a special case of fuzzy theory that restricts membership values to be either 0 or 1.
  • 65. Fuzzy Clustering • Is based on fuzzy logic and fuzzy set theory. • Objects can belong to more than one cluster. • Each object belongs to all clusters with some weight (degree of membership) 1 Cluster 1 Cluster 2 Cluster 3 0
  • 66. Hard Clustering • K-Means − The number K of clusters is given. − Each object is assigned to exactly one cluster. Partition Object C3 Cluster e1 e2 e3 e4 e3 e4 C1 0 1 0 0 C2 1 0 0 0 e2 e1 C1 C2 C3 0 0 1 1
  • 67. Fuzzy Clustering • Fuzzy-c-means − The number c of clusters is given. − Each object has a fractional membership in all clusters Object Cluster e1 e2 e3 e4 C1 0.8 0.2 0.1 0.0 Fuzzy-Clustering C2 0.2 0.2 0.2 0.0 There is no strict sub-division of clusters. C3 0.0 0.6 0.7 1.0 Σ 1 1 1 1
  • 68. Fuzzy-c-Means • Membership Matrix U = ( u i k ) ∈ [0, 1]c x n The entry u i k denotes the degree of membership of object k in cluster i . Object 1 Object 2 … Object n Cluster 1 u11 u12 … u1n Cluster 2 u21 u22 … u2n … … … … Cluster c uc1 uc2 … ucn
  • 69. Restrictions (Membership Matrix) 1. All weights for a given object, ek, must add up to 1. c Σ u ik = 1 (k = 1,...,n) i =1 2. Each cluster contains – with non-zero weight – at least one object, but does not contain – with a weight of one – all the objects. n 0< Σ u ik < n (i = 1,...,c) k =1
  • 70. Fuzzy-c-Means • Vector of prototypes (cluster centroids) T V = ( v1,...,vc ) ∈ Rc Remark The cluster centroids and the membership matrix are initialized randomly. Afterwards they are iteratively optimized.
  • 71. Fuzzy-c-Means ALGORITHM 1. Select an initial fuzzy partition U = (u i k ) ⇒ assign values to all u i k 2. Repeat 3. Compute the centroid of each cluster using the fuzzy partition 4. Update the fuzzy partition U = (u i k ) 5. Until the centroids do not change. Other stopping criterion: “change in the u i k is below a given threshold”.
  • 72. v3 u3k Fuzzy-c-Means xk • K-Means and Fuzzy-c-Means attempt to minimize v1 u1k u2k the sum of the squared errors (SSE). v2 • In K-Means: K Σ Σ 2 SSE = dist ( vi, x ) i =1 x ∈ Ci • In Fuzzy-c-Means: c n m SSE = Σ Σ u ik . 2 dist ( vi, xk ) i =1 k =1 m ∈ [1, ∞] is a parameter (fuzzifier) that determines the influence of the weights.
  • 73. v3 u3k Computing Cluster Centroids xk • For each cluster i = 1,...,c the centroid is defined by v1 u1k u2k n v2 Σ u m xk k = 1 ik _________________ ( i = 1,...,c ) vi = n m Σ u ik (V) k=1 • This is an extension of the definition of centroids of k-means. • All points are considered and the contribution of each point to the centroid is weighted by its membership degrees.
  • 74. Update of the Fuzzy Partition (Membership Matrix) • Minimization of SSE subject to the constraints leads to the following update formula: 1 ______________________________________ u ik = 1 _____ c 2 (U) dist ( v i , xk ) m – 1 Σ __________ 2 s=1 dist ( vs , xk )
  • 75. Fuzzy-c-Means Initialization Determine (randomly) • Matrix U of membership grades • Matrix V of cluster centroids. Iteration Calculate updates of • Matrix U of membership grades with (U) • Matrix V of cluster centroids with (V) until cluster centroids are stable or the maximum number of iterations is reached.
  • 76. Fuzzy-c-means • Fuzzy-c-means depends on the Euclidian metric ⇒ spherical clusters. • Other metrics can be applied to obtain different cluster shapes. • Fuzzy covariance matrix (Gustafson/Kessel 1979) ⇒ ellipsoidal clusters.
  • 77. Cluster Validity Indizes
  • 78. Cluster Validity Indexes Fuzzy-c-means requires the number of clusters as input. Question: How can we determine the “optimal” number of clusters? Idea: Determine the cluster partition for a given number of clusters. Then, evaluate the cluster partition by a cluster validity index. Method: For all possible number of clusters calculate the cluster validity index. Then, determine the optimal number of clusters. Note: CVIs usually do not depend on the clustering algorithm.
  • 79. Cluster Validity Indexes • Partition Coefficient (Bezdek 1981) c n 1 2 PC (c) = __ Σ Σ uik , 2 ≤ c ≤ n-1 n i=1 k=1 • Optimal number of clusters c∗ : PC (c∗) = max PC (c) 2 ≤ c ≤ n-1
  • 80. Cluster Validity Indexes • Partition Entropy (Bezdek 1974) c n 1 PC (c) = _ __ Σ Σ u i k log2 u i k , 2 ≤ c ≤ n-1 n i=1 k=1 • Optimal number of clusters c∗ : PC (c∗) = min PC (c) 2 ≤ c ≤ n-1 • Drawback of PC and PE: Only degrees of memberships are considered. The geometry of the data set is neglected.
  • 81. Cluster Validity Indexes • Fukuyama-Sugeno Index (Fukuyama/Sugeno 1989) c n m Σ Σ 2 Compactness FS (c) = uik dist ( vi , xk ) of clusters i=1 k=1 c n _ m _ Σ Σ Separation 2 uik dist ( vi , v ) of clusters i=1 k=1 • Optimal number of clusters c∗ : _ 1 c PC (c∗) = max PC (c) v = __ Σ vi 2 ≤ c ≤ n-1 c i =1
  • 83. Data Mining and Decision Support Systems ̶ Landslide Events (UniBw, Geoinformatics Group: W. Reinhardt, E. Nuhn) → Spatial Data Mining / Early Warning Systems for Landslide Events → Fuzzy clustering approaches (feature weighting) • Measurements (pressure values, tension, deformation vectors) • Simulations (finite-element model)
  • 84. Hard Clustering Data Partition Problem: Uncertain data from measurements and simulations
  • 85. Fuzzy Clustering Data Fuzzy-Cluster Fuzzy-Partition
  • 87. Feature Weighting Nuhn/Kropat/Reinhardt/Pickl: Preparation of complex landslide simulation results with clustering approaches for decision support and early warning. Submitted to Hawaii International Conference on System Sciences (HICCS 45), Grand Wailea, Maui, 2012.
  • 88. Thank you very much!