SlideShare a Scribd company logo
1 of 36
Author
Rakesh Agrawal , Johannes Gehrke, Dimitrios Gunopulos,
Prabhakar Raghavan
Prepared by : Raed T Aldahdooh
 Introduction
 Motivation
 Contributions Of The Paper
 Subspace Clustering
 CLIQUE(Clustering in Quest)
 Performance Experiments
 Conclusions
 Agrawal, Gehrke, Gunopulos, Raghavan (SIGMOD’98)
 CLIQUE can be considered as both density-based and grid-
based
 Clustering high-dimensional data.
 Automatically identifying subspaces of a high dimensional data space that
allow better clustering than original space
 Many irrelevant dimensions may mask clusters.
 Distance measure becomes meaningless—due to
equi-distance.
 Clusters may exist only in some subspaces.
 Only data in one dimension is relatively packed.
 Adding a dimension “stretch” the points across that dimension, making
them further apart.
 Density decrease dramatically.
 Distance measure becomes meaningless—due to equi-distance.
 Methods
◦ Feature transformation: only effective if most dimensions are relevant
 PCA “Principal component analysis” & SVD “Singular
value decomposition” useful only when features are highly
correlated/redundant
◦ Feature selection: wrapper or filter approaches
 useful to find a subspace where the data have nice clusters
◦ Subspace-clustering: find clusters in all the possible subspaces
 CLIQUE, ProClus, and frequent pattern-based clustering
The need for developing new algorithms
 Effective treatment of high dimensionality:
◦ To effectively extract information from a huge amount of data in databases. In
other words. The running time of algorithms must be predictable and usable in
large database.
 Interpretability of results:
◦ User expect clustering results in the high dimensional data to be interpretable,
comprehensible.
 Scalability and usability:
◦ Many clustering algorithms don’t well in a large database may contain millions
of objects, Clustering on a sample of a given data set may lead to biased results.
In other words, The clustering technique should be fast and scale with the
number of dimensions and the size of input and insensitive to the order of input
data.
 CLIQUE satisfies the above desiderata
( Effective , interpretability, Scalability and Usability).
 CLIQUE can automatically finds subspaces with high density
clusters.
 CLIQUE generates a minimal description for each cluster in
DNF expressions.
 Empirical evaluation shows that CLIQUE scales linearly with
the number of input records and has good scalability as the
number of dimension in the dimensionality of the hidden
cluster.
 a disjunctive normal form (DNF) is a
standardization (or normalization) of a logical
formula which is a disjunction of conjunctive clauses.
 A disjunction of conjunctions where every variable or
its negation is represented once in each conjunction
(a minterm)
◦ each minterm appears only once
Example: DNF of pq is
(pq)(pq).
 Clusters may exist only in some
subspaces.
 Subspace-clustering: find clusters in
all the subspaces.
 What’s (a)unit (b)dense unit (c)a cluster (d)a minimal description of a cluster.
 In Figure 1,the two dim space(age , salary) has been partitioned by a 10x10 grid.ξ=10
 The unit u=(30≤age<35)Λ(1≤salary<2)
 A and B are both region
 A=(30≤age<50)Λ(4≤salary<8)
 B =(40≤age<60)Λ(2≤salary<6)
 Assuming the dense units have been shaded,
 AUB is a cluster( A,B are connected regions)
 A∩B is not a maximal region.
 The minimal description for this cluster AUB is the
 DNF expression: ( (30≤age<50)Λ(4≤salary<8))v
 ( (40≤age<60)Λ(2≤salary<6))
 In Figure2. Assuming T=20%
(density threshold _ 3 point) If
selectivity(u)>T then u is a dense
unit.
 Where selectivity in the fraction of
total data points contained in the
unit.
 No 2-dimen unit is dense and
there are no clusters In the
original data space.
The points are projected on the salary dimension , there are three 1-dim dense
units, and there are two clusters in the 1-dim salary subspace,
C=(5≤salary<7 )and D=(2≤salary<3)
But there is no dense unit and cluster in 1-dim age subspace
3.
Generation of
minimal
description for
the clusters.
 CLIQUE consists of the following three steps:
1) Identification of subspace that contain clusters.
2) Identification of clusters .
3) Generation of minimal description for the clusters.
Title in here
2.
Identification of
clusters.
1.
Identification of
subspace that
contain clusters.
CLIQUE consists
of the following
three steps:
 Downward closure (DC) property: If a cluster
is satisfied in a k-dimensional space, it is
also satisfied in all of its (k-1)-dimensional
subspaces.
 Due to the DC property, identification of
subspaces is carried out in an iterative
bottom-up fashion (from lower to higher
dimensional subspaces).
 The difficulty in identifying subspaces that contain clusters
lies in finding dense units in different subspaces.
 A. using a bottom-up algorithm to find dense units that
exploits the monotonicity of the clustering criterion with
respect to dimensionality to prune the search space.
 Lemma1 (monotonicity):If k-dim unit is dense ,then so are
it’s projections in (k-1)-dim space.
 The bottom-up algorithm process
 Determines 1-dim dense unit and interaction(self-join) to get 2-dim dense unit.
Until having (k-1)dim dense units, We can self-join DK-1 to get the candidate k-dim units.
 we discard those dense units from Ck which have a projection (k-1)-dim that
isn't included in Ck-1 .
 B. Making the bottom-up algorithm faster with MDL-base
pruning.
 A. Determination of dense units
◦ Determine the set D1 of all one-dimensional dense units.
◦ k=1
◦ While Dk ≠  do
 k=k+1
 Determine the set Dk as the set of all the k-dimensional dense units
all of whose (k-1)-dimensional projections, belong to Dk-1.
◦ End while
 B. Determination of high coverage subspaces.
◦ Determine all the subspaces that contain at least one dense
unit.
◦ Sort these subspaces in descending order according to their
coverage (fraction of the num. of points of the original data set
they contain).
◦ Optimize a suitably defined Minimum Description Length
criterion function and determine a threshold under which a
coverage is considered “low”.
◦ Select the subspaces with “high” coverage.
 The input to the step of Finding Clusters is a set of dense units
D all in the same k-dim space.
 Depth-first search algorithm
◦ Using a Depth –first search algorithm to find the connected components
of the graph, By starting with some U in D, Assign it the first cluster
number and find all the units it is connected to, then if there still are
units in D that have not yet been visited , we find one and repeat the
procedure.
 For each high coverage subspace S do
◦ Consider the set E of all the dense units in S.
◦ While E ≠ 
◦ m´ =1
◦ Select a randomly chosen unit u from E.
◦ Assign to Cm´, u and all units of E that are connected to u.
◦ E=E-Cm´
◦ End while
 End for
 The input to this step consists of disjoint clusters in k-
dim subspace.
 The goal is to generate a minimal description of each
cluster with two steps:
◦ Covering with maximal region.
◦ Minimal cover.
 The CLIQUE Algorithm (cont.)
3. Minimal description of clusters
The minimal description of a cluster C, produced by the Last
procedure, is the minimum possible union of hyper rectangular
regions.
For example
 A  B is the minimum cluster description of the shaded region.
 C  D  E is a non-minimal cluster description of the same
region.
 The CLIQUE Algorithm (cont.)
3. Minimal description of clusters (algorithm)
For each cluster C do
1st stage
• c=0
• While C ≠ 
 c=c+1
 Choose a dense unit in C
 For i=1 to l
o Grow the unit in both directions along the i-th dimension, trying to cover as
many units in C as possible (boxes that are not belong to C should not be
covered).
 End for
 Define the set I containing all the units covered by the above procedure
 C=C-I
• End while
2nd stage
• Remove all covers whose units are covered by at least another cover.
 A two dimensional grid of lines of edge size ξ applied in the
two-dimensional feature space.
 Two-dimensional and one-dimensional units are defined:
◦ ui
q denotes the i-th one dimensional unit along xq
◦ uij denotes the two dimensional unit resulting from the Cartesian
product of the i-th and j-th intervals along x1 and x2, respectively.
 ξ=10 and τ=8% (thus, each unit containing more than
5 points is considered to be dense).
 The points in u48 and u58, u75 and u76, u83 and u93 are
collinear.
One-dimensional dense units:
D1={u2
1, u3
1, u4
1, u5
1, u8
1, u9
1, u1
2, u2
2, u3
2, u5
2, u6
2}
Two-dimensional dense units:
D2={u21, u22, u32, u33, u83, u93}
Notes:
•Although each one of the u48, u75, u76
contains more that 5 points, they are not
included in D2.
•Although it seems unnatural for u83 and
u93 to be included in D2, they are
included since u3
2 is dense.
• All subspaces of the two-dimensional
space contain clusters.
One-dimensional clusters:
C1={u2
1, u3
1, u4
1, u5
1}
C2={u8
1, u9
1}
C3={u1
2, u2
2, u3
2}
C4={u5
2, u6
2}
Two-dimensional clusters:
C5={u21, u22, u32, u33}
C6={u83, u93}
One-dimensional dense units:
D1={u2
1, u3
1, u4
1, u5
1, u8
1, u9
1, u1
2,
u2
2, u3
2, u5
2, u6
2}
Two-dimensional dense units:
D2={u21, u22, u32, u33, u83, u93}
C1={(x1): 1 x1<5}
C2={(x1): 7 x1<9}
C3={(x2): 0 x2<3}
C4={(x2): 4 x2<6}
C5={(x1, x2): 1 x1<2, 0 x2<2}{(x1, x2): 2 x1<3, 1 x2<3}
C6={(x1, x2): 7 x1<9, 2 x2<3}
Note that C2 and C6 are
essentially the same cluster,
which is reported twice by
the algorithm.
 We now empirically evaluate CLIQUE using synthetic data (Generator
from M.Zait and H.Messatfa. a comparative study of clustering methods)
 The goals of the experiments are to assess the efficiency of
CLIQUE:
 Efficiency :Determine how the running time scales with
◦ Dimensionality of the data space.
◦ Dimensionality of clusters.
◦ Size of data.
 Accuracy:Test if CLIQUE recovers known clusters in some
subspaces of a high dimensional data space.
Using clusters embedded in 5-dim subspaces while varying
the dimensional of the space from 5 to50.
CLIQUE was able to recover all clusters in every case.
 Strength
◦ automatically finds subspaces of the highest dimensionality such that
high density clusters exist in those subspaces
◦ insensitive to the order of records in input and does not presume some
canonical data distribution
◦ scales linearly with the size of input and has good scalability as the
number of dimensions in the data increases
 Weakness
◦ The accuracy of the clustering result may be degraded at the expense of
simplicity of the method
 The problem of high dimensionality is often tackled by requiring the
user to specify the subspace for cluster analysis. But user-identification
of quite error-prone. CLIQUE can find clusters embedded in subspaces of
high dimensional data without requiring the user to guess subspaces that
might have interesting clusters.
 CLIQUE generates cluster descriptions in the form of DNF expressions
that are minimized for ease of comprehension.
 CLIQUE is insensitive to the order of input records, Some clustering
algorithms are sensitive to the order of input data.
 Empirical evolution shows that CLIQUE scales linearly with the size of
input and has good scalability as the number of dimension in the data.
 CLIQUE can accurately discover clusters embedded in lower dimensional
subspaces.
CLIQUE Automatic subspace clustering of high dimensional data for data mining application

More Related Content

What's hot

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jaxAjay Iet
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Data Structure Notes Part-1
Data Structure Notes Part-1 Data Structure Notes Part-1
Data Structure Notes Part-1 NANDINI SHARMA
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentationDharmesh Patel
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and ClusteringEng Teong Cheah
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateBilly Yang
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Examplekailash shaw
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 

What's hot (20)

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Tsp branch and-bound
Tsp branch and-boundTsp branch and-bound
Tsp branch and-bound
 
Kmeans
KmeansKmeans
Kmeans
 
Data Structure Notes Part-1
Data Structure Notes Part-1 Data Structure Notes Part-1
Data Structure Notes Part-1
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentation
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & Update
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Birch
BirchBirch
Birch
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 

Similar to CLIQUE Automatic subspace clustering of high dimensional data for data mining application

CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesData-Centric_Alliance
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdfLibya Thomas
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...IJERA Editor
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 

Similar to CLIQUE Automatic subspace clustering of high dimensional data for data mining application (20)

CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Project PPT
Project PPTProject PPT
Project PPT
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblages
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Lect4
Lect4Lect4
Lect4
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdf
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
Clustering
ClusteringClustering
Clustering
 
Db Scan
Db ScanDb Scan
Db Scan
 
K means clustering
K means clusteringK means clustering
K means clustering
 

Recently uploaded

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 

Recently uploaded (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 

CLIQUE Automatic subspace clustering of high dimensional data for data mining application

  • 1. Author Rakesh Agrawal , Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan Prepared by : Raed T Aldahdooh
  • 2.  Introduction  Motivation  Contributions Of The Paper  Subspace Clustering  CLIQUE(Clustering in Quest)  Performance Experiments  Conclusions
  • 3.  Agrawal, Gehrke, Gunopulos, Raghavan (SIGMOD’98)  CLIQUE can be considered as both density-based and grid- based  Clustering high-dimensional data.  Automatically identifying subspaces of a high dimensional data space that allow better clustering than original space
  • 4.  Many irrelevant dimensions may mask clusters.  Distance measure becomes meaningless—due to equi-distance.  Clusters may exist only in some subspaces.
  • 5.  Only data in one dimension is relatively packed.  Adding a dimension “stretch” the points across that dimension, making them further apart.  Density decrease dramatically.  Distance measure becomes meaningless—due to equi-distance.
  • 6.  Methods ◦ Feature transformation: only effective if most dimensions are relevant  PCA “Principal component analysis” & SVD “Singular value decomposition” useful only when features are highly correlated/redundant ◦ Feature selection: wrapper or filter approaches  useful to find a subspace where the data have nice clusters ◦ Subspace-clustering: find clusters in all the possible subspaces  CLIQUE, ProClus, and frequent pattern-based clustering
  • 7. The need for developing new algorithms  Effective treatment of high dimensionality: ◦ To effectively extract information from a huge amount of data in databases. In other words. The running time of algorithms must be predictable and usable in large database.  Interpretability of results: ◦ User expect clustering results in the high dimensional data to be interpretable, comprehensible.  Scalability and usability: ◦ Many clustering algorithms don’t well in a large database may contain millions of objects, Clustering on a sample of a given data set may lead to biased results. In other words, The clustering technique should be fast and scale with the number of dimensions and the size of input and insensitive to the order of input data.
  • 8.  CLIQUE satisfies the above desiderata ( Effective , interpretability, Scalability and Usability).  CLIQUE can automatically finds subspaces with high density clusters.  CLIQUE generates a minimal description for each cluster in DNF expressions.  Empirical evaluation shows that CLIQUE scales linearly with the number of input records and has good scalability as the number of dimension in the dimensionality of the hidden cluster.
  • 9.  a disjunctive normal form (DNF) is a standardization (or normalization) of a logical formula which is a disjunction of conjunctive clauses.  A disjunction of conjunctions where every variable or its negation is represented once in each conjunction (a minterm) ◦ each minterm appears only once Example: DNF of pq is (pq)(pq).
  • 10.  Clusters may exist only in some subspaces.  Subspace-clustering: find clusters in all the subspaces.
  • 11.  What’s (a)unit (b)dense unit (c)a cluster (d)a minimal description of a cluster.  In Figure 1,the two dim space(age , salary) has been partitioned by a 10x10 grid.ξ=10  The unit u=(30≤age<35)Λ(1≤salary<2)  A and B are both region  A=(30≤age<50)Λ(4≤salary<8)  B =(40≤age<60)Λ(2≤salary<6)  Assuming the dense units have been shaded,  AUB is a cluster( A,B are connected regions)  A∩B is not a maximal region.  The minimal description for this cluster AUB is the  DNF expression: ( (30≤age<50)Λ(4≤salary<8))v  ( (40≤age<60)Λ(2≤salary<6))
  • 12.  In Figure2. Assuming T=20% (density threshold _ 3 point) If selectivity(u)>T then u is a dense unit.  Where selectivity in the fraction of total data points contained in the unit.  No 2-dimen unit is dense and there are no clusters In the original data space. The points are projected on the salary dimension , there are three 1-dim dense units, and there are two clusters in the 1-dim salary subspace, C=(5≤salary<7 )and D=(2≤salary<3) But there is no dense unit and cluster in 1-dim age subspace
  • 13. 3. Generation of minimal description for the clusters.  CLIQUE consists of the following three steps: 1) Identification of subspace that contain clusters. 2) Identification of clusters . 3) Generation of minimal description for the clusters. Title in here 2. Identification of clusters. 1. Identification of subspace that contain clusters. CLIQUE consists of the following three steps:
  • 14.  Downward closure (DC) property: If a cluster is satisfied in a k-dimensional space, it is also satisfied in all of its (k-1)-dimensional subspaces.  Due to the DC property, identification of subspaces is carried out in an iterative bottom-up fashion (from lower to higher dimensional subspaces).
  • 15.  The difficulty in identifying subspaces that contain clusters lies in finding dense units in different subspaces.  A. using a bottom-up algorithm to find dense units that exploits the monotonicity of the clustering criterion with respect to dimensionality to prune the search space.  Lemma1 (monotonicity):If k-dim unit is dense ,then so are it’s projections in (k-1)-dim space.  The bottom-up algorithm process  Determines 1-dim dense unit and interaction(self-join) to get 2-dim dense unit. Until having (k-1)dim dense units, We can self-join DK-1 to get the candidate k-dim units.  we discard those dense units from Ck which have a projection (k-1)-dim that isn't included in Ck-1 .  B. Making the bottom-up algorithm faster with MDL-base pruning.
  • 16.  A. Determination of dense units ◦ Determine the set D1 of all one-dimensional dense units. ◦ k=1 ◦ While Dk ≠  do  k=k+1  Determine the set Dk as the set of all the k-dimensional dense units all of whose (k-1)-dimensional projections, belong to Dk-1. ◦ End while
  • 17.  B. Determination of high coverage subspaces. ◦ Determine all the subspaces that contain at least one dense unit. ◦ Sort these subspaces in descending order according to their coverage (fraction of the num. of points of the original data set they contain). ◦ Optimize a suitably defined Minimum Description Length criterion function and determine a threshold under which a coverage is considered “low”. ◦ Select the subspaces with “high” coverage.
  • 18.  The input to the step of Finding Clusters is a set of dense units D all in the same k-dim space.  Depth-first search algorithm ◦ Using a Depth –first search algorithm to find the connected components of the graph, By starting with some U in D, Assign it the first cluster number and find all the units it is connected to, then if there still are units in D that have not yet been visited , we find one and repeat the procedure.
  • 19.  For each high coverage subspace S do ◦ Consider the set E of all the dense units in S. ◦ While E ≠  ◦ m´ =1 ◦ Select a randomly chosen unit u from E. ◦ Assign to Cm´, u and all units of E that are connected to u. ◦ E=E-Cm´ ◦ End while  End for
  • 20.  The input to this step consists of disjoint clusters in k- dim subspace.  The goal is to generate a minimal description of each cluster with two steps: ◦ Covering with maximal region. ◦ Minimal cover.
  • 21.  The CLIQUE Algorithm (cont.) 3. Minimal description of clusters The minimal description of a cluster C, produced by the Last procedure, is the minimum possible union of hyper rectangular regions. For example  A  B is the minimum cluster description of the shaded region.  C  D  E is a non-minimal cluster description of the same region.
  • 22.  The CLIQUE Algorithm (cont.) 3. Minimal description of clusters (algorithm) For each cluster C do 1st stage • c=0 • While C ≠   c=c+1  Choose a dense unit in C  For i=1 to l o Grow the unit in both directions along the i-th dimension, trying to cover as many units in C as possible (boxes that are not belong to C should not be covered).  End for  Define the set I containing all the units covered by the above procedure  C=C-I • End while 2nd stage • Remove all covers whose units are covered by at least another cover.
  • 23.
  • 24.  A two dimensional grid of lines of edge size ξ applied in the two-dimensional feature space.  Two-dimensional and one-dimensional units are defined: ◦ ui q denotes the i-th one dimensional unit along xq ◦ uij denotes the two dimensional unit resulting from the Cartesian product of the i-th and j-th intervals along x1 and x2, respectively.  ξ=10 and τ=8% (thus, each unit containing more than 5 points is considered to be dense).
  • 25.  The points in u48 and u58, u75 and u76, u83 and u93 are collinear.
  • 26. One-dimensional dense units: D1={u2 1, u3 1, u4 1, u5 1, u8 1, u9 1, u1 2, u2 2, u3 2, u5 2, u6 2} Two-dimensional dense units: D2={u21, u22, u32, u33, u83, u93} Notes: •Although each one of the u48, u75, u76 contains more that 5 points, they are not included in D2. •Although it seems unnatural for u83 and u93 to be included in D2, they are included since u3 2 is dense. • All subspaces of the two-dimensional space contain clusters.
  • 27. One-dimensional clusters: C1={u2 1, u3 1, u4 1, u5 1} C2={u8 1, u9 1} C3={u1 2, u2 2, u3 2} C4={u5 2, u6 2} Two-dimensional clusters: C5={u21, u22, u32, u33} C6={u83, u93} One-dimensional dense units: D1={u2 1, u3 1, u4 1, u5 1, u8 1, u9 1, u1 2, u2 2, u3 2, u5 2, u6 2} Two-dimensional dense units: D2={u21, u22, u32, u33, u83, u93}
  • 28. C1={(x1): 1 x1<5} C2={(x1): 7 x1<9} C3={(x2): 0 x2<3} C4={(x2): 4 x2<6} C5={(x1, x2): 1 x1<2, 0 x2<2}{(x1, x2): 2 x1<3, 1 x2<3} C6={(x1, x2): 7 x1<9, 2 x2<3} Note that C2 and C6 are essentially the same cluster, which is reported twice by the algorithm.
  • 29.  We now empirically evaluate CLIQUE using synthetic data (Generator from M.Zait and H.Messatfa. a comparative study of clustering methods)  The goals of the experiments are to assess the efficiency of CLIQUE:  Efficiency :Determine how the running time scales with ◦ Dimensionality of the data space. ◦ Dimensionality of clusters. ◦ Size of data.  Accuracy:Test if CLIQUE recovers known clusters in some subspaces of a high dimensional data space.
  • 30.
  • 31.
  • 32. Using clusters embedded in 5-dim subspaces while varying the dimensional of the space from 5 to50. CLIQUE was able to recover all clusters in every case.
  • 33.
  • 34.  Strength ◦ automatically finds subspaces of the highest dimensionality such that high density clusters exist in those subspaces ◦ insensitive to the order of records in input and does not presume some canonical data distribution ◦ scales linearly with the size of input and has good scalability as the number of dimensions in the data increases  Weakness ◦ The accuracy of the clustering result may be degraded at the expense of simplicity of the method
  • 35.  The problem of high dimensionality is often tackled by requiring the user to specify the subspace for cluster analysis. But user-identification of quite error-prone. CLIQUE can find clusters embedded in subspaces of high dimensional data without requiring the user to guess subspaces that might have interesting clusters.  CLIQUE generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension.  CLIQUE is insensitive to the order of input records, Some clustering algorithms are sensitive to the order of input data.  Empirical evolution shows that CLIQUE scales linearly with the size of input and has good scalability as the number of dimension in the data.  CLIQUE can accurately discover clusters embedded in lower dimensional subspaces.