SlideShare uma empresa Scribd logo
1 de 39
Data Clustering Relevant Clustering Algorithms Clustering validation
Data Clustering
An Unsupervised Learning Approach
Garima Shakya
garimashakya24@gmail.com
Department of Computer Science and Technology
IIEST,Shibpur,Howrah
28 June 2016
Data Clustering Relevant Clustering Algorithms Clustering validation
Outline
1 Data Clustering
Feature Selection
Methods:
Distance based Algorithm
2 Relevant Clustering
Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and
disadvantages of K-means
and Fuzzy C-means
Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Data Clustering:
”The task of grouping a set of objects in such a way that objects
in same group (called a cluster) are more similar to each other
than to those in other groups(clusters)”.
Data Clustering Relevant Clustering Algorithms Clustering validation
Applications of Clustering:
The applications of clustering are[2]:
1.) Its an intermediate step for other fundamental data mining
problems.
2.) For Collaborative filtering.
3.) Customer Segmentation.
4.) Data summarisation.
5.) Multimedia data analysis.
6.) Biological data analysis.
7.) Social Network Analysis.
etc.
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Feature Selection Methods:
A preprocessing step in which original subsets of features are
selected.
Needed in order to enhance the quality of underlying
clustering.
Noisy and irrelevant features are pruned from contention.
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Distance based Algorithm:
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
K-means algorithm
An unsupervised learning algorithm.
Applies on the m-dimensional hyperspace,for a given data set.
The pre-processing steps are: ’Handling missing values’ and
’Scaling’.
Scaling :
If the attribute is A.and have range [Amin, Amax ].Then, to scale a
value of A as A x, the formula is:
Ax (scaled) = (Ax - Amin)/(Amax - Amin)
Data Clustering Relevant Clustering Algorithms Clustering validation
K-means Algorithm:
Handling Missing values:
For example: 1.) Replace the missing values by zero(if numerical).
2.) Replace it by the maximum possible value.
3.)Fill in missing values manually based on your domain knowledge.
4.)Replace them with the variable mean (if numerical) or the most
frequent value (if categorical).
Input: The data set, value of K (number of clusters).
Output: The clustered data set (each data element must be
assigned to any one the clusters).
Data Clustering Relevant Clustering Algorithms Clustering validation
In steps, the algorithm is as:
Step 1.) Initialise the k centroids for k clusters by randomly
selecting them as a point in m-dimensional hyperspace.Label them
uniquely.
Step 2.) For each data element,do :
2.i) Calculate the distances from every cluster centroid.
2.ii) Compare the distances and give the cluster label of each data
element as the label of centroid nearest to it.
Step 3.) For each cluster,do :
Calculate the mean of values of each data element within a
cluster.Shift the centroid to the calculated mean in previous step.
Step 4.) 4.i) Calculate the change in position of each cluster
centroids and add them all.
4.ii) If the sum calculated sum is greater than the pre-specified
threshold or the number of iterations is more than the limit,then
go to step 2.
Step 5.) Terminate.The data set with cluster labels is the result.
Data Clustering Relevant Clustering Algorithms Clustering validation
Example: three-means iris data
First iteration:
Data Clustering Relevant Clustering Algorithms Clustering validation
Example: three-means iris data
Second iteration:
Data Clustering Relevant Clustering Algorithms Clustering validation
Example: three-means iris data
Data Clustering Relevant Clustering Algorithms Clustering validation
Example: three-means iris data
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Fuzzy C-means Algorithm:
Need for fuzzy:
In case of overlapping clusters, Hard-clustering is not feasible.
Then,to extract such overlapping structures,Fuzzy C-means is
used.
• Fuzzy c-means allows data points to be assigned into more than
one cluster, therefore each data point has a degree of membership
(or probability) of belonging to each cluster.
Algorithm:Fuzzy Clustering is carried out through an iterative
optimization of the objective function:
where,
Data Clustering Relevant Clustering Algorithms Clustering validation
Fuzzy C-means Algorithm
m is any real number greater than 1 and determines the level of
cluster fuzziness. A large m results in smaller memberships wij and
hence, fuzzier clusters.
wij is the degree of membership of xi in the cluster j,
xi is the ith of d-dimensional measured data,
cj is the d-dimension center of the cluster, and
||xi − cj || is any norm representing the similarity(or dissimilarity)
between any measured data xi and the centroid cj .
Input: Data Set X = {x1, x2, x3, ...., xn}, value of C (number of
clusters), value for m.
Data Clustering Relevant Clustering Algorithms Clustering validation
Fuzzy C-means Algorithm
Output: A set of clusters C = {c1, c2, c3, ...., cn}, A partitioning
matrix W as:
In steps,the algorithm is as:
Step 1.) Initialise the C centroids for C clusters by randomly
selecting them as a point in m-dimensional hyperspace.Label them
uniquely.
Step 2.) For each data element xi , do :
2.i) Calculate the distances(or similarity measure), ||xi − cj || from
every cluster centroid.
2.ii) Calculate the fuzzy membership wij , of xi to belong in cj , by:
Data Clustering Relevant Clustering Algorithms Clustering validation
Fuzzy C-means Algorithm
And fill the value of wij in matrix W.
Step 3.) For each cluster,do :
3.i) Calculate the new position(or value) of centroid,ck by :
3.ii) Shift the centroid to the calculated position(or value) in
previous step.
Step 4.) 4.i) compute and update values of elements of W (k) as
W (k+1
4.ii) If ||W (k+1) − W (k)|| > β, then go to Step 2. Else The present
value of W is the resultant partitioning matrix.
where, k is the iteration step.
β is the termination criterion, β ∈ [0, 1].
’W (k)’ is the fuzzy membership matrix in kth iteration.
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms:
The K-means algorithm is fast, robust and easier to
understand. FCM is more complex that K-means.
K-means gives better result when data set are distinct or well
separated from each other, that is for non-overlapping
clusters.FCM is better for overlapping clusters.
The limitations of both is that, the value of K must be known
priorly,
K-mean results in local optima,it is applicable only when
mean is defined, hence fails for categorical data and is unable
to handle noisy data and outliers[1]
The time complexity of the K-Means algorithm is O(tkdn) and
the time complexity of FCM algorithm is O(ndk2t) where, n
is number of data objects, k is number clusters, d is dimension
of each object and t is iterations. Normally, k, t, d << n[5]
Data Clustering Relevant Clustering Algorithms Clustering validation
How do we know that a particular clustering is good or that it
solves the needs of the applications??
Data Clustering Relevant Clustering Algorithms Clustering validation
How do we know that a particular clustering is good or that it
solves the needs of the applications??
Given a particular clustering how do we know, what the quality of
the clustering really is???
Data Clustering Relevant Clustering Algorithms Clustering validation
Cluster validation
Evaluation of clustering results sometimes is referred to as cluster
validation.[7]
Measures are used to compare the quality of different clustering
algorithms.
The measures are classified as:
Internal Indices.
External Indices.
Data Clustering Relevant Clustering Algorithms Clustering validation
Internal Indices:
Evaluation is Based on the data that was clustered itself.
For Example:
Dunn and Dunn index.
Davies-Bouldin index.
etc.
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Dunn and Dunn index:
It is proposed by [3].
Identifies clusters which are well separated and compact.
Goal is[6]:
to maximize the inter-cluster distance.
minimizing the intra-cluster distance.
The Dunn index for k clusters is defined by:
Data Clustering Relevant Clustering Algorithms Clustering validation
Dunn and Dunn index:
where,
is the dissimilarity between clusters ci and cj ; and
is the intra-cluster function (or diameter) of the cluster.
• If Dunn index is large, it means that compact and well separated
clusters exist.
Data Clustering Relevant Clustering Algorithms Clustering validation
Dunn and Dunn index:
The Dunn index is:
Computationally expensive
Sensitive to noisy data
Useful for identifying clean clusters in data sets.
Data Clustering Relevant Clustering Algorithms Clustering validation
Contents
1 Data Clustering
Feature Selection Methods:
Distance based Algorithm
2 Relevant Clustering Algorithms
K-means algorithm
Fuzzy C-means Algorithm
Advantages and disadvantages of K-means and Fuzzy
C-means Algorithms
3 Clustering validation
Dunn and Dunn index
Davies Bouldin index
Data Clustering Relevant Clustering Algorithms Clustering validation
Davies Bouldin index:
The Davies Bouldin index [8] is based on similarity measure of
clusters (Rij ).
Dispersion(si ) of a cluster and dissimilarity between
clusters(dij ) are used to compute the Davies-Bouldin(DB)
index.
Similarity measure of clusters, (Rij ) must satisfy the
conditions:
Rij >= 0
Rij = Rji
if si = 0 and sj = 0 then Rij = 0
if sj >sk and dij = dik then Rij >Rik
if sj = sk and dij <dik then Rij >Rik
Data Clustering Relevant Clustering Algorithms Clustering validation
Davies Bouldin index:
Usually Rij is defined in the following way[4]:
Then the Davies Bouldin index is defined as:
where,
Data Clustering Relevant Clustering Algorithms Clustering validation
Davies Bouldin index:
The Davies Boludin index measures the average of similarity
between each cluster and its most similar one.
As the clusters have to be compact and separated, the lower
Davies Bouldin index means better cluster configuration.
Davies-Bouldin index gives good results for distinct groups.
Its not designed to accommodate overlapping clusters.
Appendix
References I
[1] site/dataclusteringalgorithms/k-means-clustering-algorithm.
[2] Aggrawal, C. C., and Reddy, C. K.
Data Clustering.
CRC Press.
[3] Dunn, J.
Well separated clusters and optimal fuzzy partitions.
Journal of Cybernetics 4 (1974), 95104.
[4] Ferenc Kovcs, Csaba Legny, A. B.
Cluster validity measurement techniques.
Appendix
References II
[5] Ghosh, S., and Dubey, S. K.
Comparative analysis of k-means and fuzzy c-means
algorithms.
((IJACSA) International Journal of Advanced Computer
Science and Applications 4(4) (2013).
[6] Sandro Saitta, B. R., and Smith, I. F.
A bounded index for cluster validity.
[7] wikipedia.
https://en.wikipedia.org/wiki/cluster analysis#evaluation and assessm
Appendix
Any questions???
Appendix
Thank You!
You can contact me at garimashakya24@gmail.com

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
 
K means clustering
K means clusteringK means clustering
K means clustering
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Kmeans
KmeansKmeans
Kmeans
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Data clustring
Data clustring Data clustring
Data clustring
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 

Destaque

Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification Mahmoud Alfarra
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster AnalysisDerek Kane
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 

Destaque (8)

Clustering
ClusteringClustering
Clustering
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Text clustering
Text clusteringText clustering
Text clustering
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 

Semelhante a Data Clustering Unsupervised Learning Approach

Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0theijes
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...Madan Golla
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Semelhante a Data Clustering Unsupervised Learning Approach (20)

Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
47 292-298
47 292-29847 292-298
47 292-298
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Último

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Data Clustering Unsupervised Learning Approach

  • 1. Data Clustering Relevant Clustering Algorithms Clustering validation Data Clustering An Unsupervised Learning Approach Garima Shakya garimashakya24@gmail.com Department of Computer Science and Technology IIEST,Shibpur,Howrah 28 June 2016
  • 2. Data Clustering Relevant Clustering Algorithms Clustering validation Outline 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 3. Data Clustering Relevant Clustering Algorithms Clustering validation Data Clustering: ”The task of grouping a set of objects in such a way that objects in same group (called a cluster) are more similar to each other than to those in other groups(clusters)”.
  • 4. Data Clustering Relevant Clustering Algorithms Clustering validation Applications of Clustering: The applications of clustering are[2]: 1.) Its an intermediate step for other fundamental data mining problems. 2.) For Collaborative filtering. 3.) Customer Segmentation. 4.) Data summarisation. 5.) Multimedia data analysis. 6.) Biological data analysis. 7.) Social Network Analysis. etc.
  • 5. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 6. Data Clustering Relevant Clustering Algorithms Clustering validation Feature Selection Methods: A preprocessing step in which original subsets of features are selected. Needed in order to enhance the quality of underlying clustering. Noisy and irrelevant features are pruned from contention.
  • 7. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 8. Data Clustering Relevant Clustering Algorithms Clustering validation Distance based Algorithm:
  • 9. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 10. Data Clustering Relevant Clustering Algorithms Clustering validation K-means algorithm An unsupervised learning algorithm. Applies on the m-dimensional hyperspace,for a given data set. The pre-processing steps are: ’Handling missing values’ and ’Scaling’. Scaling : If the attribute is A.and have range [Amin, Amax ].Then, to scale a value of A as A x, the formula is: Ax (scaled) = (Ax - Amin)/(Amax - Amin)
  • 11. Data Clustering Relevant Clustering Algorithms Clustering validation K-means Algorithm: Handling Missing values: For example: 1.) Replace the missing values by zero(if numerical). 2.) Replace it by the maximum possible value. 3.)Fill in missing values manually based on your domain knowledge. 4.)Replace them with the variable mean (if numerical) or the most frequent value (if categorical). Input: The data set, value of K (number of clusters). Output: The clustered data set (each data element must be assigned to any one the clusters).
  • 12. Data Clustering Relevant Clustering Algorithms Clustering validation In steps, the algorithm is as: Step 1.) Initialise the k centroids for k clusters by randomly selecting them as a point in m-dimensional hyperspace.Label them uniquely. Step 2.) For each data element,do : 2.i) Calculate the distances from every cluster centroid. 2.ii) Compare the distances and give the cluster label of each data element as the label of centroid nearest to it. Step 3.) For each cluster,do : Calculate the mean of values of each data element within a cluster.Shift the centroid to the calculated mean in previous step. Step 4.) 4.i) Calculate the change in position of each cluster centroids and add them all. 4.ii) If the sum calculated sum is greater than the pre-specified threshold or the number of iterations is more than the limit,then go to step 2. Step 5.) Terminate.The data set with cluster labels is the result.
  • 13. Data Clustering Relevant Clustering Algorithms Clustering validation Example: three-means iris data First iteration:
  • 14. Data Clustering Relevant Clustering Algorithms Clustering validation Example: three-means iris data Second iteration:
  • 15. Data Clustering Relevant Clustering Algorithms Clustering validation Example: three-means iris data
  • 16. Data Clustering Relevant Clustering Algorithms Clustering validation Example: three-means iris data
  • 17. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 18. Data Clustering Relevant Clustering Algorithms Clustering validation Fuzzy C-means Algorithm: Need for fuzzy: In case of overlapping clusters, Hard-clustering is not feasible. Then,to extract such overlapping structures,Fuzzy C-means is used. • Fuzzy c-means allows data points to be assigned into more than one cluster, therefore each data point has a degree of membership (or probability) of belonging to each cluster. Algorithm:Fuzzy Clustering is carried out through an iterative optimization of the objective function: where,
  • 19. Data Clustering Relevant Clustering Algorithms Clustering validation Fuzzy C-means Algorithm m is any real number greater than 1 and determines the level of cluster fuzziness. A large m results in smaller memberships wij and hence, fuzzier clusters. wij is the degree of membership of xi in the cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension center of the cluster, and ||xi − cj || is any norm representing the similarity(or dissimilarity) between any measured data xi and the centroid cj . Input: Data Set X = {x1, x2, x3, ...., xn}, value of C (number of clusters), value for m.
  • 20. Data Clustering Relevant Clustering Algorithms Clustering validation Fuzzy C-means Algorithm Output: A set of clusters C = {c1, c2, c3, ...., cn}, A partitioning matrix W as: In steps,the algorithm is as: Step 1.) Initialise the C centroids for C clusters by randomly selecting them as a point in m-dimensional hyperspace.Label them uniquely. Step 2.) For each data element xi , do : 2.i) Calculate the distances(or similarity measure), ||xi − cj || from every cluster centroid. 2.ii) Calculate the fuzzy membership wij , of xi to belong in cj , by:
  • 21. Data Clustering Relevant Clustering Algorithms Clustering validation Fuzzy C-means Algorithm And fill the value of wij in matrix W. Step 3.) For each cluster,do : 3.i) Calculate the new position(or value) of centroid,ck by : 3.ii) Shift the centroid to the calculated position(or value) in previous step. Step 4.) 4.i) compute and update values of elements of W (k) as W (k+1 4.ii) If ||W (k+1) − W (k)|| > β, then go to Step 2. Else The present value of W is the resultant partitioning matrix. where, k is the iteration step. β is the termination criterion, β ∈ [0, 1]. ’W (k)’ is the fuzzy membership matrix in kth iteration.
  • 22. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 23. Data Clustering Relevant Clustering Algorithms Clustering validation Advantages and disadvantages of K-means and Fuzzy C-means Algorithms: The K-means algorithm is fast, robust and easier to understand. FCM is more complex that K-means. K-means gives better result when data set are distinct or well separated from each other, that is for non-overlapping clusters.FCM is better for overlapping clusters. The limitations of both is that, the value of K must be known priorly, K-mean results in local optima,it is applicable only when mean is defined, hence fails for categorical data and is unable to handle noisy data and outliers[1] The time complexity of the K-Means algorithm is O(tkdn) and the time complexity of FCM algorithm is O(ndk2t) where, n is number of data objects, k is number clusters, d is dimension of each object and t is iterations. Normally, k, t, d << n[5]
  • 24. Data Clustering Relevant Clustering Algorithms Clustering validation How do we know that a particular clustering is good or that it solves the needs of the applications??
  • 25. Data Clustering Relevant Clustering Algorithms Clustering validation How do we know that a particular clustering is good or that it solves the needs of the applications?? Given a particular clustering how do we know, what the quality of the clustering really is???
  • 26. Data Clustering Relevant Clustering Algorithms Clustering validation Cluster validation Evaluation of clustering results sometimes is referred to as cluster validation.[7] Measures are used to compare the quality of different clustering algorithms. The measures are classified as: Internal Indices. External Indices.
  • 27. Data Clustering Relevant Clustering Algorithms Clustering validation Internal Indices: Evaluation is Based on the data that was clustered itself. For Example: Dunn and Dunn index. Davies-Bouldin index. etc.
  • 28. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 29. Data Clustering Relevant Clustering Algorithms Clustering validation Dunn and Dunn index: It is proposed by [3]. Identifies clusters which are well separated and compact. Goal is[6]: to maximize the inter-cluster distance. minimizing the intra-cluster distance. The Dunn index for k clusters is defined by:
  • 30. Data Clustering Relevant Clustering Algorithms Clustering validation Dunn and Dunn index: where, is the dissimilarity between clusters ci and cj ; and is the intra-cluster function (or diameter) of the cluster. • If Dunn index is large, it means that compact and well separated clusters exist.
  • 31. Data Clustering Relevant Clustering Algorithms Clustering validation Dunn and Dunn index: The Dunn index is: Computationally expensive Sensitive to noisy data Useful for identifying clean clusters in data sets.
  • 32. Data Clustering Relevant Clustering Algorithms Clustering validation Contents 1 Data Clustering Feature Selection Methods: Distance based Algorithm 2 Relevant Clustering Algorithms K-means algorithm Fuzzy C-means Algorithm Advantages and disadvantages of K-means and Fuzzy C-means Algorithms 3 Clustering validation Dunn and Dunn index Davies Bouldin index
  • 33. Data Clustering Relevant Clustering Algorithms Clustering validation Davies Bouldin index: The Davies Bouldin index [8] is based on similarity measure of clusters (Rij ). Dispersion(si ) of a cluster and dissimilarity between clusters(dij ) are used to compute the Davies-Bouldin(DB) index. Similarity measure of clusters, (Rij ) must satisfy the conditions: Rij >= 0 Rij = Rji if si = 0 and sj = 0 then Rij = 0 if sj >sk and dij = dik then Rij >Rik if sj = sk and dij <dik then Rij >Rik
  • 34. Data Clustering Relevant Clustering Algorithms Clustering validation Davies Bouldin index: Usually Rij is defined in the following way[4]: Then the Davies Bouldin index is defined as: where,
  • 35. Data Clustering Relevant Clustering Algorithms Clustering validation Davies Bouldin index: The Davies Boludin index measures the average of similarity between each cluster and its most similar one. As the clusters have to be compact and separated, the lower Davies Bouldin index means better cluster configuration. Davies-Bouldin index gives good results for distinct groups. Its not designed to accommodate overlapping clusters.
  • 36. Appendix References I [1] site/dataclusteringalgorithms/k-means-clustering-algorithm. [2] Aggrawal, C. C., and Reddy, C. K. Data Clustering. CRC Press. [3] Dunn, J. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4 (1974), 95104. [4] Ferenc Kovcs, Csaba Legny, A. B. Cluster validity measurement techniques.
  • 37. Appendix References II [5] Ghosh, S., and Dubey, S. K. Comparative analysis of k-means and fuzzy c-means algorithms. ((IJACSA) International Journal of Advanced Computer Science and Applications 4(4) (2013). [6] Sandro Saitta, B. R., and Smith, I. F. A bounded index for cluster validity. [7] wikipedia. https://en.wikipedia.org/wiki/cluster analysis#evaluation and assessm
  • 39. Appendix Thank You! You can contact me at garimashakya24@gmail.com