SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 65-68
www.iosrjournals.org

        Automatic Clustering Using Improved Harmony Search
       Ajay Kumar Beesetti1 , Dr.Rajyalakshmi Valluri2, K.Subrahmanyam3,
                                  D.R.S. Bindu4
          1
              (Department of Electronics & Communications Engineering, ANITS, Visakhapatnam, India)
          2
              (Department of Electronics & Communications Engineering, ANITS, Visakhapatnam, India)
                                     3
                                       (Systems Engineer, Infosys Limited, India)
                                     4
                                       (Systems Engineer, Infosys Limited, India)

Abstract: The paper presents automatic clustering using Harmony Search based clustering algorithm. In this
algorithm, the capability of Improved Harmony search is used to automatically evolve the appropriate number
of clusters as well as the locations of cluster centers. By incorporating the concept of variable length in each
harmony vector, our strategy is able to encode variable number of candidate cluster centers at each iteration.
The CH cluster validity index is used as an objective function to validate the clustering result obtained from
each harmony memory vector. The proposed approach has been applied onto well-known datasets and
experimental results show that the approach is able to find the appropriate number of clusters and locations of
cluster centers.
Keywords – Automatic Clustering, Harmony Search, Harmony Memory Vector, Cluster Centers

                                            I.     INTRODUCTION
          Clustering is the task of partitioning the given unlabeled dataset into a number of groups such that
objects in the same group are similar to each other and dissimilar to the objects in the other groups. The
dissimilarities are assessed based on the attribute Values representing the objects. Although, classification seems
to be a good tool for distinguishing or grouping of data yet it requires a large collection of labeled data basing on
which the classifier model is developed. Additional advantage of clustering techniques is that it is adaptable to
changes and helps in finding useful feature that distinguishes different groups. Cluster analysis has been widely
applied in various research areas such as market research, pattern recognition, data analysis and image
processing, data mining, statistics, machine learning, spatial database technology, biology. It may serve as a
preprocessing step for other algorithms such as attribute subset selection and classification, which would operate
on the resulting clustered data and the selected attributes. Clustering methods have been classified as
partitioning methods, Hierarchical methods and Density-based methods grid-based methods. A partitioning
algorithm organizes the objects into k partitions (K<=n, the no of objects in the dataset), where each partition
represents a cluster. The clusters are formed to optimize an objective criterion, so that the objects within a
cluster are similar to each other and dissimilar to objects in the other clusters. A hierarchical clustering method
works by grouping data objects into a tree of clusters. The problem of partitioned clustering has been developed
from a number of fields like the statics[1], graph theory[2], expectation maximization[3],artificial neural
networks[4], Evolutionary Computing [5], swarm intelligence[6] and so on. Though there is plethora of papers
on the methods to partition the data into clusters there exists a problem of finding the number of clusters in the
dataset. Finding the correct number of clusters is too difficult because there are no class labels as is the case in
classification task.

                                          II.     Harmony Search
         Calculus has been used in solving many engineering and scientific problems. These calculus based
problem solving techniques can be used only if the objective functions used can be differentiable. These
techniques are futile when the objective function is step-wise, discontinuous, multi-modal or when the decision
variables are discrete rather than continuous. This is one of the reasons which lead the research community
towards the metaheuristic algorithms which have been inspired by biological evolution, animal behavior, or
metallic annealing. Harmony search is a music-inspired metaheuristic algorithm. Interestingly, there exists an
analogy between music and optimization: each musical instrument corresponds to each decision variable;
musical note corresponds to variable value; and harmony corresponds to solution vector. Just like musicians in
Jazz improvisation play notes randomly or based on experiences in order to find fantastic harmony, variables in
the harmony search algorithm have rand34om values or previously-memorized good values in order to find
optimal solution.
Harmony Search (HS) is a heuristic, based on the improvisation process of the jazz musicians [7],[8].Starting
with some initially generated harmonies , the musicians will try to improve the harmonies in each iteration. This

                                             www.iosrjournals.org                                          65 | Page
Automatic Clustering Using Improved Harmony Search

analogy is used in HS to optimize the given objective function .Like how jazz musicians improvise the new
harmonies by improvisation, HS algorithm creates new solutions based on the past solutions and on random
modifications. The basic HS algorithm described in the literature is as follows




                                             Fig 1: Harmony Search Flowchart

          The HS algorithm initializes the Harmonic Memory HM with randomly generated solutions. The
number of randomly generated solutions that are stored in the HM is determined by the Harmonic Memory Size
(HMS).Once the Harmonic memory is filled with random solutions new solutions are created iteratively as
follows. A New Harmony vector is created based on three rules 1) memory consideration 2) pitch adjustment 3)
random selection. Generating a new harmony is called improvisation .The parameters that are used in the
generation process are Harmony Memory Consideration Rate (HMCR) and Pitch Adjusting Rate (PAR). Each
decision variable is set to the value of the corresponding variable of one of the solutions in the HM with a
probability of HMCR, and an additional modification of this value is performed with a probability of PAR.
Otherwise (with a probability of 1 HMCR), the decision variable is set to a random value. After a new solution
has been created, it is evaluated and compared to the worst solution in the HM. If its objective value is better
than that of the worst solution, it replaces the worst solution in the HM. This process is repeated, until a
termination criterion is met. More detailed description of the algorithm can be found in [7][9].Madhavi et al
proposed an improved harmony search (IHS) algorithm that uses variable PAR and bandwidth in improvisation
step. In their method PAR and bw change dynamically with generation number as expressed below:

                                                       (PAR max −PAR min )
                               PAR gn = PARmin +               NI
                                                                             × gn ….. (1)

Where PAR (gn) is the pitch adjusting rate for each generation, PARmin is the minimum pitch adjusting rate,
PARmax is the maximum pitch adjusting rate and gn is the generation number.

                                         bw (gn) = bwmax exp (c.gn) ….(2)
                                                    (bw )
                                                 ln⁡ min
                                                    (bwmax )
                                              c=             … … … (3)
                                                      NI
Where bw (gn) is the bandwidth for each generation, bwmin is the minimum bandwidth and bwmax is the
maximum bandwidth. Recently, other variants of harmony search are also having been proposed.

                        III.    Automatic Clustering Using Harmony Search
         Cluster analysis identifies and classifies objects individuals or variables on the basis of the similarity of
the characteristics they possess. It seeks to minimize within-group variance and maximize between-group
variance. The result of cluster analysis is a number of heterogeneous groups with homogeneous
contents: There are substantial differences between the groups, but the individuals within a single group are
similar. Data may be thought of as points in a space where the axes correspond to the variables. Cluster analysis
divides the space into regions characteristic of groups that it finds in the data. We can show this with a simple
graphical example:




                                             www.iosrjournals.org                                           66 | Page
Automatic Clustering Using Improved Harmony Search




                                               Fig 2: Clustering Example
A. Selecting the objective functions:
          Cluster analysis partitions the set of observations into mutually exclusive groupings in order to best
represent distinct sets of observations within the sample. Cluster analysis is not able to confirm the validity of
these groupings as there are no predefined classes. Several clustering validity measures have been developed.
The result of a clustering algorithm can be very different from each other on the same data set as the other input
parameters of an algorithm can extremely modify the behavior and execution of the algorithm. The aim of the
cluster validity is to find the partitioning that best fits the underlying data. The process of evaluating the quality
of a clustering is called the clustering assessment. Two measurement criteria have been proposed for evaluating
and selecting an optimal clustering scheme in the literature
      Compactness: The members of each cluster should be as close to each other as possible. A common
          measure of compactness is the variance.
      Separation: The clusters themselves should be widely separated.
    There are three common approaches measuring the distance between two different clusters: distance between
the closest member of the clusters, distance between the most distant members and distance between the centers
of the clusters. Numerical measures that are applied to judge various aspects of cluster validity are classified
into the following three types [10], [11].
1. External Index: Used to measure the extent to which cluster labels match externally supplied class labels.
2. Internal Index: Used to measure the goodness of a clustering structure without respect to external information.
3. Relative Index: Used to compare two different clustering of clusters.

   Sometimes these are called as criteria instead of indices. The performance of the automatic clustering
algorithm depends upon the clustering objective function that is being optimized. In this work we choose the CH
index [12] as the objective function. The idea behind the CH measure is to compute the sum of squared
errors(distances) between the kthcluster and the other k - 1 clusters, and compare that to the internal sum of
squared errors for the k clusters (taking their individual squared error terms and summing them, a pooled value
so to speak). In effect, this is a measure of inter-cluster (dis)similarity over intra-cluster (dis)similarity.
B (k) - between cluster sums of squares
W (k) - within cluster sum of squares
                                    CH (k) = [B(k)/(k-1)]/ [W(k)/(N-k)] …….(4)
Now, if B(k) and W(k) are both measures of difference/dissimilarity, then a larger CH value indicates a better
clustering, since the between cluster difference should be high, and the within cluster difference should below.

B. Search Variable Representation

   In this work we have been using a centroid based representation for the search variable. Given a dataset
consisting of K instances each having D dimensions the search variable is constructed as follows containing
variable number of cluster centers.


           𝑍i,G =   Ti,1   Ti,2   ..   Ti,Kmax      𝑚i,1      𝑚I,2   .   𝑚 𝑖,𝐾𝑚𝑎𝑥

Table 1: Dataset consisting of K instances, showing variable number of cluster centers where, Zi, G is the ith
search variable in Gth iteration
mi,j is the jth cluster center

                                                 www.iosrjournals.org                                       67 | Page
Automatic Clustering Using Improved Harmony Search

  Ti,j is a Threshold value
  If Ti,j<0.5 then the corresponding cluster center mi,j is not active in the search variable
  If Ti,j>0.5 then the corresponding cluster center mi,j is active in the search variable Kmax is the user specified
  number representing the maximum number of clusters.
                                     Dataset        No.of               Expected         No.      of
                                                    Attributes          No.of            clusters
                                                                        Clusters         obtained

                                     Iris           4                     3              3
                                     Wine           13                    3              3
                                     Bupa           7                     2              2
                                     Balance        5                     3              3

                                           Table 2: Simulation results for four data sets.

                                                          IV.      Results
            The algorithm is implemented using MATLAB 7.0. AMD Athlonprocessor (2.3GHz) and 1GB RAM.
  The simulation results showing the effectiveness of the Harmony Search based clustering has been provided for
  four real life datasets: Iris, Wine, Bupa and balance data
            The parameters for the algorithm are set as follows HMCR=0.9, PARmax=0.8 and PARmin=0.4
  BWmin=0.01 and BWmax=0.1. The average (corrected to decimal) of the output of 15 individual runs of the
  algorithm have been reported as the number of clusters obtained.

                                                       V.       Conclusion
      In this paper, the problem of automatic clustering has been investigated and presented a Harmony Search
  based strategy for crisp clustering of real world datasets. An important feature of this clustering algorithm is that
  it is able to find the optimal number of clusters automatically, that is the number of clusters does not have to be
  known in advance. Our experimental results show that Harmony Search can be a candidate optimization
  technique for clustering. Future research may extend the single objective automatic clustering using Harmony
  Search to handle multi-objective-optimization.

                                                                References
[1]  E.W. Forgy, Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of classification, Biometrics, 21, (1965) 768-9.
[2]  C. T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers C-20 (1971), 68–
     86.
 [3] M.H. Law, M.A.T. Figueiredo, and A.K. Jain, Simultaneous Feature Selection and Clustering Using Mixture Models, IEEE
     Transactions Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1154-1166, Sept.2004.
 [4] N. R. Pal, J. C. Bezdek, E. C.-K. Tsao, Generalized clustering networks and Kohonen’s self-organizing scheme, IEEE Trans. Neural
     Networks, vol 4, (1993) 549–557.
 [5] S. Das, A. Abraham, and A. Konar, Metaheuristic Clustering, Studies in Computational Intelligence, SpringerVerlag, Germany, 2009.
 [6] A. Abraham, S. Das, and S. Roy, Swarm Intelligence Algorithms for Data Clustering, Soft Computing for Knowledge Discovery and
     Data Mining, O. Maimon and L. Rokach (Eds.), Springer Verlag, Germany, pp. 279-313,2007.
 [7] Z.W. Geem, J.H. Kim, and G.V. Loganathan, “A new heuristic optimization algorithm: harmony search”, Simulation, vol. 76, 2001,pp.
     60-68.
 [8] Z.W. Geem, M. Fesanghary, J. Choi, M.P.Saka, J.C. Williams, M.T. Ayvaz, L. Li, S.Ryu, and A. Vasebi, “Recent advances in harmony
     search”, in Advance in Evolutionary Algorithms, WitoldKosiński, Ed. Vienna : ITeachEducation and Publishing, 2008, pp.127-142
 [9] M. Mahdavi, M. Fesanghary, and E. Damangir, “An improved harmony search algorithm for solving optimization problems”, Applied
     Mathematics and Computation. vol. 188, 2008, pp. 1567-1579.
[10] M. Halkidi, Y. Batistakis and M. Vazirgiannis, Cluster validity methods:part I, SIGMOD Rec., Vol. 31, No. 2, pp. 40-45, 2002
[11] M. Halkidi, Y. Batistakis and M. Vazirgiannis, Cluster validity methods:part II, SIGMOD Rec., Vol. 31, No. 3, pp. 19-27, 2002
[12] Calinski, T., Harabasz, J: A dendrite method for cluster analysis. Communications in Statistic 3 (1974) pp127




                                                       www.iosrjournals.org                                                    68 | Page

Mais conteúdo relacionado

Destaque

Efficiency of Optical Fiber Communication for Dissemination of Information wi...
Efficiency of Optical Fiber Communication for Dissemination of Information wi...Efficiency of Optical Fiber Communication for Dissemination of Information wi...
Efficiency of Optical Fiber Communication for Dissemination of Information wi...IOSR Journals
 
Enhance similarity searching algorithm with optimized fast population count m...
Enhance similarity searching algorithm with optimized fast population count m...Enhance similarity searching algorithm with optimized fast population count m...
Enhance similarity searching algorithm with optimized fast population count m...IOSR Journals
 
Power Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETPower Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETIOSR Journals
 
Impact of Insect Resistant Cotton
Impact of Insect Resistant CottonImpact of Insect Resistant Cotton
Impact of Insect Resistant CottonIOSR Journals
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
Wireless Energy Transfer
Wireless Energy TransferWireless Energy Transfer
Wireless Energy TransferIOSR Journals
 
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...IOSR Journals
 
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingDetecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingIOSR Journals
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
 
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...IOSR Journals
 
Peer To Peer Content Sharing On Wi-Fi Network For Smart Phones
Peer To Peer Content Sharing On Wi-Fi Network For Smart PhonesPeer To Peer Content Sharing On Wi-Fi Network For Smart Phones
Peer To Peer Content Sharing On Wi-Fi Network For Smart PhonesIOSR Journals
 
Fast Human Detection in Surveillance Video
Fast Human Detection in Surveillance VideoFast Human Detection in Surveillance Video
Fast Human Detection in Surveillance VideoIOSR Journals
 
Hand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabHand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabIOSR Journals
 
The Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingThe Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingIOSR Journals
 
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMMDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMIOSR Journals
 
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...IOSR Journals
 
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...IOSR Journals
 
A Review on Image Inpainting to Restore Image
A Review on Image Inpainting to Restore ImageA Review on Image Inpainting to Restore Image
A Review on Image Inpainting to Restore ImageIOSR Journals
 

Destaque (20)

Efficiency of Optical Fiber Communication for Dissemination of Information wi...
Efficiency of Optical Fiber Communication for Dissemination of Information wi...Efficiency of Optical Fiber Communication for Dissemination of Information wi...
Efficiency of Optical Fiber Communication for Dissemination of Information wi...
 
Enhance similarity searching algorithm with optimized fast population count m...
Enhance similarity searching algorithm with optimized fast population count m...Enhance similarity searching algorithm with optimized fast population count m...
Enhance similarity searching algorithm with optimized fast population count m...
 
Power Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETPower Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANET
 
H0434555
H0434555H0434555
H0434555
 
Impact of Insect Resistant Cotton
Impact of Insect Resistant CottonImpact of Insect Resistant Cotton
Impact of Insect Resistant Cotton
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
G0444246
G0444246G0444246
G0444246
 
Wireless Energy Transfer
Wireless Energy TransferWireless Energy Transfer
Wireless Energy Transfer
 
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
Information Hiding for “Color to Gray and back” with Hartley, Slant and Kekre...
 
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingDetecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...
Proximate Composition of Whole Seeds and Pulp of African Black Velvet Tamarin...
 
Peer To Peer Content Sharing On Wi-Fi Network For Smart Phones
Peer To Peer Content Sharing On Wi-Fi Network For Smart PhonesPeer To Peer Content Sharing On Wi-Fi Network For Smart Phones
Peer To Peer Content Sharing On Wi-Fi Network For Smart Phones
 
Fast Human Detection in Surveillance Video
Fast Human Detection in Surveillance VideoFast Human Detection in Surveillance Video
Fast Human Detection in Surveillance Video
 
Hand Vien Recognotion Using Matlab
Hand Vien Recognotion Using MatlabHand Vien Recognotion Using Matlab
Hand Vien Recognotion Using Matlab
 
The Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud ComputingThe Delivery of Web Mining in Healthcare System on Cloud Computing
The Delivery of Web Mining in Healthcare System on Cloud Computing
 
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMMDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
 
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...
Ab Initio Study of the Electronic and Phonon Band Structure Of the Mixed Vale...
 
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...
A New Single-Stage Multilevel Type Full-Bridge Converter Applied to closed lo...
 
A Review on Image Inpainting to Restore Image
A Review on Image Inpainting to Restore ImageA Review on Image Inpainting to Restore Image
A Review on Image Inpainting to Restore Image
 

Semelhante a K0956568

Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesIOSR Journals
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithmRespa Peter
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmIOSR Journals
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006IJARTES
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problemcsandit
 
Artificial bee colony with fcm for data clustering
Artificial bee colony with fcm for data clusteringArtificial bee colony with fcm for data clustering
Artificial bee colony with fcm for data clusteringAlie Banyuripan
 
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data ijscai
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmeSAT Publishing House
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
 

Semelhante a K0956568 (20)

Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
A0310112
A0310112A0310112
A0310112
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution Algorithm
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
 
Artificial bee colony with fcm for data clustering
Artificial bee colony with fcm for data clusteringArtificial bee colony with fcm for data clustering
Artificial bee colony with fcm for data clustering
 
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithm
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 
Dv33736740
Dv33736740Dv33736740
Dv33736740
 
Dv33736740
Dv33736740Dv33736740
Dv33736740
 

Mais de IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Último

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

K0956568

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 65-68 www.iosrjournals.org Automatic Clustering Using Improved Harmony Search Ajay Kumar Beesetti1 , Dr.Rajyalakshmi Valluri2, K.Subrahmanyam3, D.R.S. Bindu4 1 (Department of Electronics & Communications Engineering, ANITS, Visakhapatnam, India) 2 (Department of Electronics & Communications Engineering, ANITS, Visakhapatnam, India) 3 (Systems Engineer, Infosys Limited, India) 4 (Systems Engineer, Infosys Limited, India) Abstract: The paper presents automatic clustering using Harmony Search based clustering algorithm. In this algorithm, the capability of Improved Harmony search is used to automatically evolve the appropriate number of clusters as well as the locations of cluster centers. By incorporating the concept of variable length in each harmony vector, our strategy is able to encode variable number of candidate cluster centers at each iteration. The CH cluster validity index is used as an objective function to validate the clustering result obtained from each harmony memory vector. The proposed approach has been applied onto well-known datasets and experimental results show that the approach is able to find the appropriate number of clusters and locations of cluster centers. Keywords – Automatic Clustering, Harmony Search, Harmony Memory Vector, Cluster Centers I. INTRODUCTION Clustering is the task of partitioning the given unlabeled dataset into a number of groups such that objects in the same group are similar to each other and dissimilar to the objects in the other groups. The dissimilarities are assessed based on the attribute Values representing the objects. Although, classification seems to be a good tool for distinguishing or grouping of data yet it requires a large collection of labeled data basing on which the classifier model is developed. Additional advantage of clustering techniques is that it is adaptable to changes and helps in finding useful feature that distinguishes different groups. Cluster analysis has been widely applied in various research areas such as market research, pattern recognition, data analysis and image processing, data mining, statistics, machine learning, spatial database technology, biology. It may serve as a preprocessing step for other algorithms such as attribute subset selection and classification, which would operate on the resulting clustered data and the selected attributes. Clustering methods have been classified as partitioning methods, Hierarchical methods and Density-based methods grid-based methods. A partitioning algorithm organizes the objects into k partitions (K<=n, the no of objects in the dataset), where each partition represents a cluster. The clusters are formed to optimize an objective criterion, so that the objects within a cluster are similar to each other and dissimilar to objects in the other clusters. A hierarchical clustering method works by grouping data objects into a tree of clusters. The problem of partitioned clustering has been developed from a number of fields like the statics[1], graph theory[2], expectation maximization[3],artificial neural networks[4], Evolutionary Computing [5], swarm intelligence[6] and so on. Though there is plethora of papers on the methods to partition the data into clusters there exists a problem of finding the number of clusters in the dataset. Finding the correct number of clusters is too difficult because there are no class labels as is the case in classification task. II. Harmony Search Calculus has been used in solving many engineering and scientific problems. These calculus based problem solving techniques can be used only if the objective functions used can be differentiable. These techniques are futile when the objective function is step-wise, discontinuous, multi-modal or when the decision variables are discrete rather than continuous. This is one of the reasons which lead the research community towards the metaheuristic algorithms which have been inspired by biological evolution, animal behavior, or metallic annealing. Harmony search is a music-inspired metaheuristic algorithm. Interestingly, there exists an analogy between music and optimization: each musical instrument corresponds to each decision variable; musical note corresponds to variable value; and harmony corresponds to solution vector. Just like musicians in Jazz improvisation play notes randomly or based on experiences in order to find fantastic harmony, variables in the harmony search algorithm have rand34om values or previously-memorized good values in order to find optimal solution. Harmony Search (HS) is a heuristic, based on the improvisation process of the jazz musicians [7],[8].Starting with some initially generated harmonies , the musicians will try to improve the harmonies in each iteration. This www.iosrjournals.org 65 | Page
  • 2. Automatic Clustering Using Improved Harmony Search analogy is used in HS to optimize the given objective function .Like how jazz musicians improvise the new harmonies by improvisation, HS algorithm creates new solutions based on the past solutions and on random modifications. The basic HS algorithm described in the literature is as follows Fig 1: Harmony Search Flowchart The HS algorithm initializes the Harmonic Memory HM with randomly generated solutions. The number of randomly generated solutions that are stored in the HM is determined by the Harmonic Memory Size (HMS).Once the Harmonic memory is filled with random solutions new solutions are created iteratively as follows. A New Harmony vector is created based on three rules 1) memory consideration 2) pitch adjustment 3) random selection. Generating a new harmony is called improvisation .The parameters that are used in the generation process are Harmony Memory Consideration Rate (HMCR) and Pitch Adjusting Rate (PAR). Each decision variable is set to the value of the corresponding variable of one of the solutions in the HM with a probability of HMCR, and an additional modification of this value is performed with a probability of PAR. Otherwise (with a probability of 1 HMCR), the decision variable is set to a random value. After a new solution has been created, it is evaluated and compared to the worst solution in the HM. If its objective value is better than that of the worst solution, it replaces the worst solution in the HM. This process is repeated, until a termination criterion is met. More detailed description of the algorithm can be found in [7][9].Madhavi et al proposed an improved harmony search (IHS) algorithm that uses variable PAR and bandwidth in improvisation step. In their method PAR and bw change dynamically with generation number as expressed below: (PAR max −PAR min ) PAR gn = PARmin + NI × gn ….. (1) Where PAR (gn) is the pitch adjusting rate for each generation, PARmin is the minimum pitch adjusting rate, PARmax is the maximum pitch adjusting rate and gn is the generation number. bw (gn) = bwmax exp (c.gn) ….(2) (bw ) ln⁡ min (bwmax ) c= … … … (3) NI Where bw (gn) is the bandwidth for each generation, bwmin is the minimum bandwidth and bwmax is the maximum bandwidth. Recently, other variants of harmony search are also having been proposed. III. Automatic Clustering Using Harmony Search Cluster analysis identifies and classifies objects individuals or variables on the basis of the similarity of the characteristics they possess. It seeks to minimize within-group variance and maximize between-group variance. The result of cluster analysis is a number of heterogeneous groups with homogeneous contents: There are substantial differences between the groups, but the individuals within a single group are similar. Data may be thought of as points in a space where the axes correspond to the variables. Cluster analysis divides the space into regions characteristic of groups that it finds in the data. We can show this with a simple graphical example: www.iosrjournals.org 66 | Page
  • 3. Automatic Clustering Using Improved Harmony Search Fig 2: Clustering Example A. Selecting the objective functions: Cluster analysis partitions the set of observations into mutually exclusive groupings in order to best represent distinct sets of observations within the sample. Cluster analysis is not able to confirm the validity of these groupings as there are no predefined classes. Several clustering validity measures have been developed. The result of a clustering algorithm can be very different from each other on the same data set as the other input parameters of an algorithm can extremely modify the behavior and execution of the algorithm. The aim of the cluster validity is to find the partitioning that best fits the underlying data. The process of evaluating the quality of a clustering is called the clustering assessment. Two measurement criteria have been proposed for evaluating and selecting an optimal clustering scheme in the literature  Compactness: The members of each cluster should be as close to each other as possible. A common measure of compactness is the variance.  Separation: The clusters themselves should be widely separated. There are three common approaches measuring the distance between two different clusters: distance between the closest member of the clusters, distance between the most distant members and distance between the centers of the clusters. Numerical measures that are applied to judge various aspects of cluster validity are classified into the following three types [10], [11]. 1. External Index: Used to measure the extent to which cluster labels match externally supplied class labels. 2. Internal Index: Used to measure the goodness of a clustering structure without respect to external information. 3. Relative Index: Used to compare two different clustering of clusters. Sometimes these are called as criteria instead of indices. The performance of the automatic clustering algorithm depends upon the clustering objective function that is being optimized. In this work we choose the CH index [12] as the objective function. The idea behind the CH measure is to compute the sum of squared errors(distances) between the kthcluster and the other k - 1 clusters, and compare that to the internal sum of squared errors for the k clusters (taking their individual squared error terms and summing them, a pooled value so to speak). In effect, this is a measure of inter-cluster (dis)similarity over intra-cluster (dis)similarity. B (k) - between cluster sums of squares W (k) - within cluster sum of squares CH (k) = [B(k)/(k-1)]/ [W(k)/(N-k)] …….(4) Now, if B(k) and W(k) are both measures of difference/dissimilarity, then a larger CH value indicates a better clustering, since the between cluster difference should be high, and the within cluster difference should below. B. Search Variable Representation In this work we have been using a centroid based representation for the search variable. Given a dataset consisting of K instances each having D dimensions the search variable is constructed as follows containing variable number of cluster centers. 𝑍i,G = Ti,1 Ti,2 .. Ti,Kmax 𝑚i,1 𝑚I,2 . 𝑚 𝑖,𝐾𝑚𝑎𝑥 Table 1: Dataset consisting of K instances, showing variable number of cluster centers where, Zi, G is the ith search variable in Gth iteration mi,j is the jth cluster center www.iosrjournals.org 67 | Page
  • 4. Automatic Clustering Using Improved Harmony Search Ti,j is a Threshold value If Ti,j<0.5 then the corresponding cluster center mi,j is not active in the search variable If Ti,j>0.5 then the corresponding cluster center mi,j is active in the search variable Kmax is the user specified number representing the maximum number of clusters. Dataset No.of Expected No. of Attributes No.of clusters Clusters obtained Iris 4 3 3 Wine 13 3 3 Bupa 7 2 2 Balance 5 3 3 Table 2: Simulation results for four data sets. IV. Results The algorithm is implemented using MATLAB 7.0. AMD Athlonprocessor (2.3GHz) and 1GB RAM. The simulation results showing the effectiveness of the Harmony Search based clustering has been provided for four real life datasets: Iris, Wine, Bupa and balance data The parameters for the algorithm are set as follows HMCR=0.9, PARmax=0.8 and PARmin=0.4 BWmin=0.01 and BWmax=0.1. The average (corrected to decimal) of the output of 15 individual runs of the algorithm have been reported as the number of clusters obtained. V. Conclusion In this paper, the problem of automatic clustering has been investigated and presented a Harmony Search based strategy for crisp clustering of real world datasets. An important feature of this clustering algorithm is that it is able to find the optimal number of clusters automatically, that is the number of clusters does not have to be known in advance. Our experimental results show that Harmony Search can be a candidate optimization technique for clustering. Future research may extend the single objective automatic clustering using Harmony Search to handle multi-objective-optimization. References [1] E.W. Forgy, Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of classification, Biometrics, 21, (1965) 768-9. [2] C. T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers C-20 (1971), 68– 86. [3] M.H. Law, M.A.T. Figueiredo, and A.K. Jain, Simultaneous Feature Selection and Clustering Using Mixture Models, IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1154-1166, Sept.2004. [4] N. R. Pal, J. C. Bezdek, E. C.-K. Tsao, Generalized clustering networks and Kohonen’s self-organizing scheme, IEEE Trans. Neural Networks, vol 4, (1993) 549–557. [5] S. Das, A. Abraham, and A. Konar, Metaheuristic Clustering, Studies in Computational Intelligence, SpringerVerlag, Germany, 2009. [6] A. Abraham, S. Das, and S. Roy, Swarm Intelligence Algorithms for Data Clustering, Soft Computing for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach (Eds.), Springer Verlag, Germany, pp. 279-313,2007. [7] Z.W. Geem, J.H. Kim, and G.V. Loganathan, “A new heuristic optimization algorithm: harmony search”, Simulation, vol. 76, 2001,pp. 60-68. [8] Z.W. Geem, M. Fesanghary, J. Choi, M.P.Saka, J.C. Williams, M.T. Ayvaz, L. Li, S.Ryu, and A. Vasebi, “Recent advances in harmony search”, in Advance in Evolutionary Algorithms, WitoldKosiński, Ed. Vienna : ITeachEducation and Publishing, 2008, pp.127-142 [9] M. Mahdavi, M. Fesanghary, and E. Damangir, “An improved harmony search algorithm for solving optimization problems”, Applied Mathematics and Computation. vol. 188, 2008, pp. 1567-1579. [10] M. Halkidi, Y. Batistakis and M. Vazirgiannis, Cluster validity methods:part I, SIGMOD Rec., Vol. 31, No. 2, pp. 40-45, 2002 [11] M. Halkidi, Y. Batistakis and M. Vazirgiannis, Cluster validity methods:part II, SIGMOD Rec., Vol. 31, No. 3, pp. 19-27, 2002 [12] Calinski, T., Harabasz, J: A dendrite method for cluster analysis. Communications in Statistic 3 (1974) pp127 www.iosrjournals.org 68 | Page