2. INTRODUCTION
Clustering is an unsupervised learning method of data abstraction.
The method of identifying similar groups of data in a dataset is
called Clustering.
It is basically a collection of objects on the basis of similarity and
dissimilarity between them.
3. TYPES OF CLUSTERING
Hard Clustering
In hard clustering, each data point either belongs to a cluster
completely or not.
Soft Clustering
Soft clustering is about grouping the data items such that
an item can exists in multiple clusters.
4. CLUSTERING METHODS
Density-Based Methods :
These method search the data space for areas of varied density of data points in
the data space.
Hierarchical Based Methods:
In this method, the clusters forms a tree-type structure based on the hierarchy
New clusters are formed using the previously formed one.
It is divided into two category
• Agglomerative
• Divisive
5. Partitioning Based Methods:
These methods partition the objects into k cluster and each partition forms
one cluster.
example :- K means
Grid-Based Methods:
In this method, the data space is formulated into a finite number of cells
that form a grid-like structure.
6. K Means Clustering
It is an algorithm to group similar elements or data points to cluster.
The number of groups or cluster is represented by k.
It assumes that the object attribute forms a vector space based on features
that are already provided.
7. K Means Clustering Algorithm
Step 1: First we initialize k points, called means, randomly.
Step 2:We categorize each item to its closest mean and we update the mean’s
coordinates, which are the averages of the items categorized in that mean so
far.
Step 3: We repeat the process for a given number of iterations and at the end,
we have our clusters.
8. Example of K-means Clustering
Let us consider a table
Individual Height Weight
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
9. Step 1: Randomly we choose two centroids for two clusters
k1=(185,72)
k2=(170,56)
Step 2: Now using these centroids we compute Eucledian Distance 3rd point
ED=sqrt[(xo-xc)^2+(y0-yc)^2]
k1=sqrt[(168-185)^2+(60-72)^2]
k1=20.80
k2=sqrt[(168-170)^2+(60-56)^2]
k2=4.48
Therefore 3 belongs to k2
Step 3: Calculate new centroid values for k2
k2=[(170+168)/2 , (60+56)/2]
k2=(169,58)
Individual Height Weight
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
11. Hierarchical Clustering
Hierarchical Clustering finds successive clusters using previously
established clusters.
No Assumptions on the number of clusters.
12. Agglomerative Hierarchical Clustering
Initially consider every data point as an individual Cluster and at every
step, merge the nearest pairs of the cluster.
It is a bottom-up method.
At first every data set is considered as individual entity or cluster.
At every iteration, the clusters merge with different clusters until one
cluster is formed.
14. Divisive Hierarchical Clustering
Divisive Hierarchical clustering is precisely the opposite of the
Agglomerative Hierarchical clustering.
In Divisive Hierarchical clustering, we take into account all of the data
points as a single cluster.
In every iteration, we separate the data points from the clusters which
aren’t comparable.
In the end, we are left with N clusters.