Clustering

CLUSTERING
Presented By:
SHARBANI DEY
LIPIKA SAHA

INTRODUCTION
 Clustering is an unsupervised learning method of data abstraction.
 The method of identifying similar groups of data in a dataset is
called Clustering.
 It is basically a collection of objects on the basis of similarity and
dissimilarity between them.

TYPES OF CLUSTERING
 Hard Clustering
In hard clustering, each data point either belongs to a cluster
completely or not.
 Soft Clustering
Soft clustering is about grouping the data items such that
an item can exists in multiple clusters.

CLUSTERING METHODS
Density-Based Methods :
These method search the data space for areas of varied density of data points in
the data space.
Hierarchical Based Methods:
In this method, the clusters forms a tree-type structure based on the hierarchy
New clusters are formed using the previously formed one.
It is divided into two category
• Agglomerative
• Divisive

Partitioning Based Methods:
These methods partition the objects into k cluster and each partition forms
one cluster.
example :- K means
Grid-Based Methods:
In this method, the data space is formulated into a finite number of cells
that form a grid-like structure.

K Means Clustering
 It is an algorithm to group similar elements or data points to cluster.
 The number of groups or cluster is represented by k.
 It assumes that the object attribute forms a vector space based on features
that are already provided.

K Means Clustering Algorithm
Step 1: First we initialize k points, called means, randomly.
Step 2:We categorize each item to its closest mean and we update the mean’s
coordinates, which are the averages of the items categorized in that mean so
far.
Step 3: We repeat the process for a given number of iterations and at the end,
we have our clusters.

Example of K-means Clustering
Let us consider a table
Individual Height Weight
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72

Step 1: Randomly we choose two centroids for two clusters
k1=(185,72)
k2=(170,56)
Step 2: Now using these centroids we compute Eucledian Distance 3rd point
ED=sqrt[(xo-xc)^2+(y0-yc)^2]
k1=sqrt[(168-185)^2+(60-72)^2]
k1=20.80
k2=sqrt[(168-170)^2+(60-56)^2]
k2=4.48
Therefore 3 belongs to k2
Step 3: Calculate new centroid values for k2
k2=[(170+168)/2 , (60+56)/2]
k2=(169,58)
Individual Height Weight
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72

K1={1,4,5}
K2={2,3}
Individual k1 K2
3 20.80 4.48
4 6.32 14.14
5 2 12.56

Hierarchical Clustering
 Hierarchical Clustering finds successive clusters using previously
established clusters.
 No Assumptions on the number of clusters.

Agglomerative Hierarchical Clustering
 Initially consider every data point as an individual Cluster and at every
step, merge the nearest pairs of the cluster.
It is a bottom-up method.
At first every data set is considered as individual entity or cluster.
At every iteration, the clusters merge with different clusters until one
cluster is formed.

Example of Agglomerative Hierarchical
Clustering

Divisive Hierarchical Clustering
Divisive Hierarchical clustering is precisely the opposite of the
Agglomerative Hierarchical clustering.
In Divisive Hierarchical clustering, we take into account all of the data
points as a single cluster.
In every iteration, we separate the data points from the clusters which
aren’t comparable.
In the end, we are left with N clusters.

Example of Divisive Hierarchical Clustering

Reference
• https://www.edureka.co/data-science-python-certification-course
• https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-
and-different-methods-of-
clustering/#:~:text=Clustering%20is%20the%20task%20of,and%20assign%20t
hem%20into%20clusters
• https://www.google.com/amp/s/www.geeksforgeeks.org/clustering-in-machine-
learning/amp/
• https://towardsdatascience.com/k-means-clustering-algorithm-applications-
evaluation-methods-and-drawbacks-aa03e644b48a
• https://www.kdnuggets.com/2019/09/hierarchical-clustering.html
• https://towardsdatascience.com/hierarchical-clustering-agglomerative-and-
divisive-explained-342e6b20d710
• https://towardsdatascience.com/understanding-the-concept-of-hierarchical-
clustering-technique-c6e8243758ec
• https://developers.google.com/machine-learning/clustering/overview
• https://www.google.com/amp/s/www.geeksforgeeks.org/hierarchical-
clustering-in-data-mining/amp/
• https://www.google.com/amp/s/www.geeksforgeeks.org/k-means-clustering-
introduction/amp/

Clustering

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Clustering

Semelhante a Clustering (20)

Último

Último (20)

Clustering