SlideShare uma empresa Scribd logo
1 de 12
(CentreforKnowledgeTransfer)
institute
Cluster
ing
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
What is Clustering
• Clustering is the task of dividing the population or data points into a
number of groups such that data points in the same groups are more
similar to other data points in the same group and dissimilar to the
data points in other groups. It is basically a collection of objects on
the basis of similarity and dissimilarity between them.
(CentreforKnowledgeTransfer)
institute
• For ex– The data points in the graph below clustered together can be
classified into one single group. We can distinguish the clusters, and
we can identify that there are 3 clusters in the below picture.
(CentreforKnowledgeTransfer)
institute
• It is not necessary for clusters to be spherical. Such as :
DBSCAN: Density-based Spatial Clustering of
Applications with Noise
These data points are clustered by using the
basic concept that the data point lies within
the given constraint from the cluster center.
Various distance methods and techniques are
used for the calculation of the outliers.
(CentreforKnowledgeTransfer)
institute
Why Clustering?
• Clustering is very much important as it determines the intrinsic grouping
among the unlabelled data present.
• There are no criteria for good clustering.
• It depends on the user, what is the criteria they may use which satisfy their
need.
• For instance, we could be interested in finding representatives for
homogeneous groups (data reduction), in finding “natural clusters” and
describe their unknown properties (“natural” data types), in finding useful
and suitable groupings (“useful” data classes) or in finding unusual data
objects (outlier detection).
• This algorithm must make some assumptions that constitute the similarity
of points and each assumption make different and equally valid clusters.
(CentreforKnowledgeTransfer)
institute
Clustering Methods :
• Density-Based Methods
• Hierarchical Based Methods
• Partitioning Methods
• Grid-based Methods
(CentreforKnowledgeTransfer)
institute
Density-Based Methods
• These methods consider the clusters as the dense region having some
similarities and differences from the lower dense region of the space.
• These methods have good accuracy and the ability to merge two
clusters.
Example
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise),
• OPTICS (Ordering Points to Identify Clustering Structure), etc.
(CentreforKnowledgeTransfer)
institute
Hierarchical Based Methods
• The clusters formed in this method form a tree-type structure based
on the hierarchy. New clusters are formed using the previously
formed one. It is divided into two category
• Agglomerative (bottom-up approach)
• Divisive (top-down approach)
Examples
• CURE (Clustering Using Representatives),
• BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies)
(CentreforKnowledgeTransfer)
institute
Partitioning Methods
• These methods partition the objects into k clusters and each partition
forms one cluster.
• This method is used to optimize an objective criterion similarity
function such as when the distance is a major parameter
Example
• K-means,
• CLARANS (Clustering Large Applications based upon Randomized
Search)
(CentreforKnowledgeTransfer)
institute
Grid-based Methods
• In this method, the data space is formulated into a finite number of
cells that form a grid-like structure.
• All the clustering operations done on these grids are fast and
independent of the number of data objects
Example
• STING (Statistical Information Grid),
• wave cluster,
• CLIQUE (CLustering In Quest), etc.
(CentreforKnowledgeTransfer)
institute
Clustering Algorithms
• K-means clustering algorithm – It is the simplest unsupervised
learning algorithm that solves clustering problem.
• K-means algorithm partitions n observations into k clusters where
each observation belongs to the cluster with the nearest mean
serving as a prototype of the cluster.
(CentreforKnowledgeTransfer)
institute
Applications of Clustering in different fields
• Marketing: It can be used to characterize & discover customer segments
for marketing purposes.
• Biology: It can be used for classification among different species of plants
and animals.
• Libraries: It is used in clustering different books on the basis of topics and
information.
• Insurance: It is used to acknowledge the customers, their policies and
identifying the frauds.
• City Planning: It is used to make groups of houses and to study their values
based on their geographical locations and other factors present.
• Earthquake studies: By learning the earthquake-affected areas we can
determine the dangerous zones.

Mais conteúdo relacionado

Mais procurados

Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 

Mais procurados (20)

05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Data mining
Data miningData mining
Data mining
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 

Semelhante a Clustering

CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 

Semelhante a Clustering (20)

clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
 
Data mining
Data miningData mining
Data mining
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
computational statistics machine learning unit 5.pptx
computational statistics machine learning unit 5.pptxcomputational statistics machine learning unit 5.pptx
computational statistics machine learning unit 5.pptx
 

Mais de Dr. C.V. Suresh Babu

Mais de Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”
 

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Clustering

  • 2. (CentreforKnowledgeTransfer) institute What is Clustering • Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.
  • 3. (CentreforKnowledgeTransfer) institute • For ex– The data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below picture.
  • 4. (CentreforKnowledgeTransfer) institute • It is not necessary for clusters to be spherical. Such as : DBSCAN: Density-based Spatial Clustering of Applications with Noise These data points are clustered by using the basic concept that the data point lies within the given constraint from the cluster center. Various distance methods and techniques are used for the calculation of the outliers.
  • 5. (CentreforKnowledgeTransfer) institute Why Clustering? • Clustering is very much important as it determines the intrinsic grouping among the unlabelled data present. • There are no criteria for good clustering. • It depends on the user, what is the criteria they may use which satisfy their need. • For instance, we could be interested in finding representatives for homogeneous groups (data reduction), in finding “natural clusters” and describe their unknown properties (“natural” data types), in finding useful and suitable groupings (“useful” data classes) or in finding unusual data objects (outlier detection). • This algorithm must make some assumptions that constitute the similarity of points and each assumption make different and equally valid clusters.
  • 6. (CentreforKnowledgeTransfer) institute Clustering Methods : • Density-Based Methods • Hierarchical Based Methods • Partitioning Methods • Grid-based Methods
  • 7. (CentreforKnowledgeTransfer) institute Density-Based Methods • These methods consider the clusters as the dense region having some similarities and differences from the lower dense region of the space. • These methods have good accuracy and the ability to merge two clusters. Example • DBSCAN (Density-Based Spatial Clustering of Applications with Noise), • OPTICS (Ordering Points to Identify Clustering Structure), etc.
  • 8. (CentreforKnowledgeTransfer) institute Hierarchical Based Methods • The clusters formed in this method form a tree-type structure based on the hierarchy. New clusters are formed using the previously formed one. It is divided into two category • Agglomerative (bottom-up approach) • Divisive (top-down approach) Examples • CURE (Clustering Using Representatives), • BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies)
  • 9. (CentreforKnowledgeTransfer) institute Partitioning Methods • These methods partition the objects into k clusters and each partition forms one cluster. • This method is used to optimize an objective criterion similarity function such as when the distance is a major parameter Example • K-means, • CLARANS (Clustering Large Applications based upon Randomized Search)
  • 10. (CentreforKnowledgeTransfer) institute Grid-based Methods • In this method, the data space is formulated into a finite number of cells that form a grid-like structure. • All the clustering operations done on these grids are fast and independent of the number of data objects Example • STING (Statistical Information Grid), • wave cluster, • CLIQUE (CLustering In Quest), etc.
  • 11. (CentreforKnowledgeTransfer) institute Clustering Algorithms • K-means clustering algorithm – It is the simplest unsupervised learning algorithm that solves clustering problem. • K-means algorithm partitions n observations into k clusters where each observation belongs to the cluster with the nearest mean serving as a prototype of the cluster.
  • 12. (CentreforKnowledgeTransfer) institute Applications of Clustering in different fields • Marketing: It can be used to characterize & discover customer segments for marketing purposes. • Biology: It can be used for classification among different species of plants and animals. • Libraries: It is used in clustering different books on the basis of topics and information. • Insurance: It is used to acknowledge the customers, their policies and identifying the frauds. • City Planning: It is used to make groups of houses and to study their values based on their geographical locations and other factors present. • Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous zones.