SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
The International Journal of Engineering And Science (IJES)
||Volume|| 1 ||Issue|| 2 ||Pages|| 239-242 ||2012||
 ISSN: 2319 – 1813 ISBN: 2319 – 1805
    Overview of Software Fault Prediction using Clustering Approaches
                         and Tree Data Structure
                                             1
                                              Swati M.Varade, 2Prof.M.D.Ingle
               1, 3
                      Department of Computer Science, Jayawantrao Sawant College of Engg, Hadapsar, Pune-028

 ----------------------------------------------------------------------ABSTRACT------------------------------------------------------------
 Fault prediction will give one more chance to the development team to retest the modules or files for which the
 defectiveness probability is high. By spending more time on the defective modules and no time on the non-defective
 ones, the resources of the project would be utilized better and as a result, the maintenance phase of the project will
 be easier for both the customers and the project owners. Software fault prediction decreases the total cost of the
 project and increases the overall project success rate. The perfect prediction of where faults are likely to occur in
 code can help direct test effort, reduce costs and improve the quality of software. This Paper shows specific
 methods of fault prediction for software safety that directly address the root causes of software Faults and improve
 the quality of software.

 Keywords - Clustering, Hyper-Quad Tree, K-Means clustering, Quad Tree, Software Fault Prediction.
 ----------------------------------------------------------------------------------------------------------------------------------------------------

 Date of Submission: 11, December, 2012                                                       Date of Publication: 25, December 2012
 ----------------------------------------------------------------------------------------------------------------------------------------------------


I INTRODUCTION                                                                   The software life cycle methodologies might not be
                                                                                 followed very well.
The main objective of this paper is to predict the fault that                 Improper and incomplete testing of software.
tends to occur while classifying the dataset also tries to                  Such faulty software classes may increase development &
improve the quality of software. Developing a defect free                   maintenance cost, due to software failures and decrease
software system is very difficult task and sometimes there                  customer’s satisfaction.
are some unknown faults or deficiencies found in software                             The main objective of this paper is to predict the
projects where there is a need of applying carefully the                    fault that tends to occur while classifying the dataset.
principles of the software development methodologies. By
spending more time on the defective modules and no time                            Hyper Quad-Trees are applied for finding initial
on the non-defective ones, the resources of the project                            cluster centers for K-Means algorithm.
would be utilized better and as a result, the maintenance                          The overall error rates of this prediction approach are
phase of the project will be easier for both the customers                         compared to other existing algorithms and are found
and the project owners.                                                            to be better in most of the cases.
When we look at the publications about Fault prediction we
saw that in early studies static code features were used                    II .RELATED WORK
more. But afterwards, it was understood that beside the                                Previous work of faulty software components enables
effect of static code metrics on Fault prediction, other                    verification experts to concentrate their time and resources on the
                                                                            problem areas of the software system under development. One of
measures like process metrics are also effective and should
                                                                            the main purposes of these models is to assist in software
be investigated. For example, Fenton and Neil (1999) argue                  maintenance budgeting.
that static code measures alone are not able to predict                                Among various clustering techniques available in
software Faults accurately.                                                 literature K-means clustering approach is most widely being used?
To support this idea if software is Faulty this might be                    Different authors apply different clustering techniques and expert-
related to one of the following:                                            based approach for software fault prediction problem. They
  The specification of the project may be wrong due to                     applied K-Means[8][9] and Neural-Gas techniques on different
     differing requirements or missing features.                            real data sets and then an expert explored the representative
                                                                            module of the cluster and several statistical data in order to label
  Because of improper documentation realization of the
                                                                            each cluster as faulty or non faulty. And based on their experience
     project is too complex.                                                Neural-Gas-based prediction approach performed slightly worse
  Scarce and incorrect requirements results in poor                        than K-Means clustering-based approach in terms of the overall
     design.                                                                error rate on large data sets. But their approach is dependent on
  Developers are not qualified enough for the project.                     the availability and capability of the expert. Seliya and



www.theijes.com                                                     The IJES                                                             Page 239
Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure
Khoshgoftaar proposed a constrained based semi-supervised                3.1.1. The algorithm is composed of the following steps
clustering scheme. They showed that this approach helped the
expert in making better estimations as compared to predictions             1.     Place K points into the space represented by the
made by an unsupervised learning algorithm. [1] a Quad Tree-                      objects that are being clustered. These points
based K-Means algorithm has been applied for predicting faults in                 represent initial group centroids.
program modules. The aim of their topic is twofold. First, Quad-
                                                                           2.     Assign each object to the group that has the closest
Trees are applied for finding the initial cluster centers to be input
to the K-Means Algorithm. Bhattacherjee and Bishnu [1] have                       centroid.
applied unsupervised learning approach for fault prediction in             3.     When all objects have been assigned, recalculate the
software module. An input threshold parameter delta governs the                   positions of the K centroids.
number of initial cluster centers and by varying delta the user can
generate desired initial cluster centers. The clusters obtained by         4.     Repeat Steps 2 and 3 until the centroids no longer
Quad Tree-based algorithm were found to have maximum gain                         move. This produces a separation of the objects into
values. Second, the Quad-tree based algorithm is applied for                      groups from which the metric to be minimized can
predicting faults in program modules. Supervised techniques have                  be calculated
however been applied for software fault prediction and software
effort prediction There is no solution to find the optimal number        3.1.2. Limitations of K-Means
of clusters for any given data set in K-Means. The overall error         The cluster centers, thus found, serve as input to the
rates of this prediction approach are compared to other existing         clustering algorithms. However, it has some inherent
algorithms and are found to be better in most of the cases. In this      drawbacks-
paper I try to find the better centroid than Quad-tree algorithm by          The user has to initialize the number of clusters which
using Hyper Quad-tree which will give as a input to the K-Means
                                                                                 is very difficult to identify in most of the cases.
algorithm for lowers the error rate and effective software fault
prediction. Due to some defective software modules, the                      It requires selection of the suitable initial cluster
maintenance phase of software projects could become really                       centers which is again subject to error. Since the
painful for the users and costly for the enterprises. That is why                structure of the clusters depends on the initial
predicting the defective modules or files in a software system                   cluster centers this may result in an inefficient
prior to project deployment is a very crucial activity, since it leads           clustering.
to a decrease in the total cost of the project and an increase in            The K-Means algorithm is very sensitive to noise.
overall project success rate .
                                                                         3.2. Quad Tree
III . OVERVIEW                                                           This data structure was named a Quad tree by Raphael
         This paper shows the study of K-Means clustering                Finkel and J.L. Bentley in 1974. A similar partitioning is
algorithm Quad tree algorithm and Hyper Quad-tree                        also known as a Q-tree. The Quad Tree-based method
algorithm then proposed system architecture for software                 assigns the appropriate initial cluster centers and eliminates
fault prediction, expected result using confusion matrix and             the outliers hence overcoming the second and third
conclusion.                                                              drawback of K-Means clustering algorithm.
3.1 K-Means clustering algorithm                                         Common features of quad tree
         K-means (MacQueen, 1967) is one of the simplest                   They decompose space into adaptable cells.
unsupervised learning algorithms that solve the well known                 Each cell (or bucket) has a maximum capacity.
clustering problem. The procedure follows a simple and                     When maximum capacity is reached, the bucket splits.
easy way to classify a given data set through a certain                    The tree directory follows the spatial decomposition of
number of clusters (assume k clusters) fixed a priori. The                     the Quad tree.
main idea is to define k centroids, one for each cluster.                Figure1. Shows the simple Quad tree representation.
These centroids should be placed in a cunning way because
of different location causes different result. So, the better
choice is to place them as much as possible far away from
each other. The next step is to take each point belonging to
a given data set and associate it to the nearest centroid.
When no point is pending, the first step is completed and
an early group page is done. At this point we need to re-
calculate k new centroids as bar centers of the clusters
resulting from the previous step. After we have these k new                                Figure1. Simple Quad Tree.
centroids, a new binding has to be done between the same                 3.2.1. Some definitions of notations and parameters
data set points and the nearest new centroid. A loop has                         Minimum: User defined threshold for minimum
been generated. As a result of this loop we may notice that                        number of data points in a sub bucket.
the k centroids change their location step by step until no                      Maximum: User defined threshold for maximum
more changes are done.                                                             number of data points in a sub bucket.
                                                                                 White leaf bucket: A sub bucket having less than
                                                                                   MIN number of data points of the parent bucket.
                                                                                   Fig. shows an illustration of a white leaf bucket.

www.theijes.com                                                  The IJES                                                    Page 240
Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure
     Black leaf bucket: A sub bucket having more than          Hyper Quad-Trees are expected to give better cluster
       MAX number of data points of the parent bucket.          centers than the Quad-tree because
     Gray bucket: a sub bucket which is neither white               It has an eight-way branching tree whose nodes are
       nor black.                                                        associated with axis- parallel boxes.
           User specified distance for finding nearest              The d-dimensional analogue is known variously as a
         neighbors.                                                      multidimensional Quad-tree and a hyper quad tree.
                                                                     It divides the regions recursively so that no region
3.2.2. Quad Tree Algorithm [8] [9]:                                      contains more than one data point.
      For each class:                                          Algorithm to generate a hyper quad tree is as follows
      Find the minimum and maximum x and y co-                     1. Select Random Node
         ordinates.                                                 2.    Initialize current node = root node
      Find the midpoint using the values obtained in the           3.         While current node is not a leaf node do
         previous step.                                             4. Generate a random number n
      Divide the spatial area into four sub regions based          5.           If n<p then    //n= Number of data points
         on the midpoint.                                           6.    Break the while loop
      Plot the points and classify regions as white leaf                      //P=Random path termination probability
         buckets or black leaf buckets.                             7. Else
      The white leaf buckets are left as such.                     8.    Randomly select one children
      The Center data-points of each black leaf bucket are         9.    current node=selected node
         calculated for all black leaf buckets.                     10.     Require: Hyper Quad Tree
      The mean of all the center points obtained in the            11. end if
         previous step is calculated.                               12.       end while
      The computed mean gives the centroid point                   13. select the current node
         necessary for that class.
                                                                  Input: Dimension, Data set, Min, Max
3.2.3. Limitations of Quad Tree                                   Output: Centroid
      The user has to initialize the number of clusters
         which is very difficult to identify using quad tree    IV . THE PROPOSED SYSTEMS
         algorithm.                                             The proposed system is ―Software fault prediction using
      It is not providing the exact centroid.                  clustering approach‖ that classify given data using Hyper
                                                                Quad-tree algorithm.
3.3Hyper Quad Tree                                              The system consists of 3 modules
The Hyper Quad Tree-based method assigns the appropriate              Create dataset parser
initial cluster centers and eliminates the outliers hence             Data set is given as input to the Hyper Quad-tree
overcoming the second and third drawback of K-Means                       algorithm in which we Create cells, insert cell,
algorithm that is                                                         label bucket, split cell, spatial decomposition
      Hyper Quad-Trees are applied for finding initial                  Input: Dimension, Data set
          cluster centers for K-Means algorithm. User can                Output: Centroid
          generate desired number of cluster centers that can         Centroid points obtained using the Hyper Quad-tree
          be used as input to the simple K-Means.                         is given as an input to the K-Means to get better
      Second, the centroid obtained by the Hyper Quad                    clusters it Calculates the distance, Shuffle data
          Tree is more accurate than Quad tree.                           points according to distance, If centroids are stable
Figure2 shows a simple hyper quad tree representation of                  then stop. The output of this will be set of clusters
data set dots denotes the data:                                       Measure the Faults in terms of FPR, FNR and
                                                                          ERROR using confusion matrix.




                                                                               Figure 3. System Architecture
               Figure2. Hyper Quad Tree [7]                              As Shown in TABLE1. The Actual labels of data
                                                                items are placed along the rows, while the predicted labels

www.theijes.com                                          The IJES                                                    Page 241
Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure
are placed along the columns. For example, a False Actual               reducing NOI. Better throughput with lower error rates of
label implies that the module is not faulty. If a not faulty            classification.
module (Actual label—False) is predicted as non-faulty                            In this paper I am not focusing on automatic
(Predicted Label—False) then there is the condition of cell             initialization of number of clusters this will be the future
A, which is True Negative, and if it is predicted as faulty             work for better software fault prediction using clustering
(Predicted label—True) then there is the condition of cell              Approach.
B, which is False Positive. Similar definitions hold for
False Negative and True Positive. The False positive rate is            ACKNOWLEDGMENT
the percentage of not faulty modules labeled as faulty by the                    This is a small review of my post graduate project
model and the false negative rate is the percentage of faulty           work that I am going to start to implement .I specially
modules labeled as fault free and Error is the percentage of            thank to my Guide for his assistance.
mislabeled modules. The following equations are used to
calculate these FPR, FNR, Error, and Precision [1]                      REFERENCES
                                                                        [1]    P.S. Bishnu and V. Bhattacherjee, Member, IEEE‖ Software Fault
                                      B                     (1)                Prediction Using Quad Tree-Based K-Means Clustering Algorithm‖
                      FPR =                                                    IEEE Transactions on Knowledge and Data Engineering, Vol. 24,
                                                                               No. 6, June 2012
                                     A+B                                [2]    P.S. Bishnu and V. Bhattacherjee, ―Outlier Detection Technique Using
                                                                               Quad Tree,‖ Proc Int’l Conf. Computer Comm. Control and
                                      C                     (2)                Information Technology, pp. 143-148, Feb. 2009.
                                                                        [3]    P.S. Bishnu and V. Bhattacherjee, ―Application of K-Medoids with kd-
                      FNR =                                                    Tree for Software Fault Prediction,‖ ACM Software Eng. Notes, vol.
                                     D+C                                       36, pp. 1-6, Mar. 2011.
                                                                        [4]    V. Bhattacherjee and P.S. Bishnu, ―Software Fault Prediction Using
                                                                               KMedoids Algorithm,‖ Proc. Int’l Conf. Productivity, Quality,
                                   B+C
                                                                               Reliability, Optimization and Modeling (ICPQROM ’11), p. 191,
                      ERROR =                                (3)               Feb. 2011.
                                 A+B+C+D                                [5]    J. Han and M. Kamber, ―Data Mining Concepts and Techniques‖,
                                                                               second ed, pp. 401-404. Morgan Kaufmann Publishers, 2007.
                                                                        [6]    Parvinder S. Sandhu, Jagdeep Singh, Vikas Gupta, Mandeep Kaur,
                     TABLE 1.Confusion Matrix                                  Sonia Manhas, Ramandeep Sidhu‖ A K-Means Based Clustering
                 Predicted Labels                                              Approach for Finding Faulty Modules in Open Source Software
                                False            True                          Systems‖ ,World Academy of Science, Engineering and Technology
                                                                               48 2010
 Actual Labels




                                (Non-Faulty)     (Faulty)               [7]    Michael Laszlo and Sumitra Mukherjee, Member, IEEE, ―A Genetic
                 False          True Negative    False Positive                Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means
                 (Non-Faulty)   A                B                             Clustering‖, IEEE transactions on pattern analysis and machine
                                                                               intelligence, vol. 28, no. 4, april 2006
                 True           False Negative   True Positive          [8]    Leela Rani.P, Rajalakshmi.P,‖ Clustering Gene Expression Data using
                 (Faulty)       C                D                             Quad-tree based Expectation Maximization Approach‖ International
                                                                               Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868
                                                                               Foundation of Computer Science FCS, New York, USA,Volume 2–
         The above performance indicators should be
                                                                               No.2, June 2012 – www.ijais.org
minimized. A high value of FPR would lead to wasted                     [9]    Meenakshi PC, Meenu S, Mithra M, Leela Rani.P,‖ Fault Prediction
testing effort while high FNR value means error prone                          using Quad-tree and Expectation Maximization Algorithm‖,
modules will escape testing.                                                   International Journal of Applied Information Systems (IJAIS) – ISSN
                                                                               : 2249-0868 Foundation of Computer Science FCS, New York, USA
         In this paper, for calculating the measures, if any                   Volume 2– No.4, May 2012 – www.ijais.org
metric value of the centroid data point of a cluster was                [10]   P.S. Bishnu and V. Bhattacharjee, ―A New Initialization Method for
greater than the threshold, that cluster was labeled as faulty                 KMeans Algorithm Using Quad Tree,‖ Proc. Nat’l Conf. Methods and
                                                                               Models in Computing (NCM2C), pp. 73-81, 2008.
and otherwise it was labeled as non-faulty. After this the
predicted fault labels will compare with the actual fault
labels. Also the clusters can be label according to the                                Biographies and Photographs
majority of its members (by comparing with metrics                      Prof.M.D.Ingle is currently working as an Associate
thresholds) but this increases the complexity of the labeling           Professor in Jayawantrao Sawant College of Engineering
procedure since all the modules in the cluster need to be               Hadapsar, Pune, India. His research interests are
examined.                                                               Networking, Information Security, Mobile Computing, and
                                                                        Data Mining etc.
                                                                        Swati M. Varade Pursuing M.E.(Computer) From
 CONCLISION AND FUTURE SCOPE
                                                                        Jayawantrao Sawant College of Engineering Hadapsar,
          Hyper Quad-tree based K-Means clustering
                                                                        Pune, India currently working as a Lecturer in same
algorithm evaluates the effectiveness in predicting faulty
                                                                        Institute. Her research interests are Software Testing, Data
software modules as compared to the original Quad-tree
                                                                        Mining etc.
based K-Means algorithm. Also it will find better the initial
cluster centers for K-Means algorithm. By using hyper quad
I try to meet the convergence criterion faster and hence it
results in lesser number of iterations. Also there will be
reduction in time and computational complexity by

www.theijes.com                                                   The IJES                                                             Page 242

Mais conteúdo relacionado

Mais procurados

A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...
A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...
A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...ijfcstjournal
 
Using Mutation in Fault Localization
Using Mutation in Fault Localization Using Mutation in Fault Localization
Using Mutation in Fault Localization csandit
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
 
Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineeringsiddu019
 
Benchmarking machine learning techniques
Benchmarking machine learning techniquesBenchmarking machine learning techniques
Benchmarking machine learning techniquesijseajournal
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...RAKESH RANA
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projectsiaemedu
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...butest
 
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...SBGC
 
Measuring effort for modifying software package as reusable package using pac...
Measuring effort for modifying software package as reusable package using pac...Measuring effort for modifying software package as reusable package using pac...
Measuring effort for modifying software package as reusable package using pac...eSAT Journals
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISIJSRD
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesPvrtechnologies Nellore
 
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET Journal
 

Mais procurados (17)

Estimation sharbani bhattacharya
Estimation sharbani bhattacharyaEstimation sharbani bhattacharya
Estimation sharbani bhattacharya
 
A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...
A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...
A DECISION SUPPORT SYSTEM FOR ESTIMATING COST OF SOFTWARE PROJECTS USING A HY...
 
Using Mutation in Fault Localization
Using Mutation in Fault Localization Using Mutation in Fault Localization
Using Mutation in Fault Localization
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...
 
Cm24585587
Cm24585587Cm24585587
Cm24585587
 
Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineering
 
1806 1810
1806 18101806 1810
1806 1810
 
Benchmarking machine learning techniques
Benchmarking machine learning techniquesBenchmarking machine learning techniques
Benchmarking machine learning techniques
 
The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...The adoption of machine learning techniques for software defect prediction: A...
The adoption of machine learning techniques for software defect prediction: A...
 
Defect effort prediction models in software maintenance projects
Defect  effort prediction models in software maintenance projectsDefect  effort prediction models in software maintenance projects
Defect effort prediction models in software maintenance projects
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
 
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...
2012 ieee projects software engineering @ Seabirds ( Trichy, Chennai, Pondich...
 
Measuring effort for modifying software package as reusable package using pac...
Measuring effort for modifying software package as reusable package using pac...Measuring effort for modifying software package as reusable package using pac...
Measuring effort for modifying software package as reusable package using pac...
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFIS
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
 
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine LearningIRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 

Destaque

The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)maditabalnco
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source CreativitySara Cannon
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016Andrew Chen
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Destaque (7)

The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Semelhante a The International Journal of Engineering and Science (IJES)

A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...Shakas Technologies
 
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...IOSR Journals
 
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...iosrjce
 
CRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECASTCRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECASTIRJET Journal
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET Journal
 
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET Journal
 
Insights on Research Techniques towards Cost Estimation in Software Design
Insights on Research Techniques towards Cost Estimation in Software Design Insights on Research Techniques towards Cost Estimation in Software Design
Insights on Research Techniques towards Cost Estimation in Software Design IJECEIAES
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES ijseajournal
 
A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...IJECEIAES
 
Abstract.doc
Abstract.docAbstract.doc
Abstract.docbutest
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningIRJET Journal
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
 
A Plagiarism Checker: Analysis of time and space complexity
A Plagiarism Checker: Analysis of time and space complexityA Plagiarism Checker: Analysis of time and space complexity
A Plagiarism Checker: Analysis of time and space complexityIRJET Journal
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated ProcessIRJET Journal
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in softwareIAEME Publication
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysiscsandit
 

Semelhante a The International Journal of Engineering and Science (IJES) (20)

A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
 
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...
 
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
 
F017652530
F017652530F017652530
F017652530
 
CRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECASTCRIME EXPLORATION AND FORECAST
CRIME EXPLORATION AND FORECAST
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
 
Insights on Research Techniques towards Cost Estimation in Software Design
Insights on Research Techniques towards Cost Estimation in Software Design Insights on Research Techniques towards Cost Estimation in Software Design
Insights on Research Techniques towards Cost Estimation in Software Design
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES
COMPARATIVE STUDY OF SOFTWARE ESTIMATION TECHNIQUES
 
A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...A simplified predictive framework for cost evaluation to fault assessment usi...
A simplified predictive framework for cost evaluation to fault assessment usi...
 
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATIONONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
 
Abstract.doc
Abstract.docAbstract.doc
Abstract.doc
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
 
A Plagiarism Checker: Analysis of time and space complexity
A Plagiarism Checker: Analysis of time and space complexityA Plagiarism Checker: Analysis of time and space complexity
A Plagiarism Checker: Analysis of time and space complexity
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated Process
 
Cv32608610
Cv32608610Cv32608610
Cv32608610
 
Defect effort prediction models in software
Defect effort prediction models in softwareDefect effort prediction models in software
Defect effort prediction models in software
 
Comparative performance analysis
Comparative performance analysisComparative performance analysis
Comparative performance analysis
 

The International Journal of Engineering and Science (IJES)

  • 1. The International Journal of Engineering And Science (IJES) ||Volume|| 1 ||Issue|| 2 ||Pages|| 239-242 ||2012|| ISSN: 2319 – 1813 ISBN: 2319 – 1805  Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure 1 Swati M.Varade, 2Prof.M.D.Ingle 1, 3 Department of Computer Science, Jayawantrao Sawant College of Engg, Hadapsar, Pune-028 ----------------------------------------------------------------------ABSTRACT------------------------------------------------------------ Fault prediction will give one more chance to the development team to retest the modules or files for which the defectiveness probability is high. By spending more time on the defective modules and no time on the non-defective ones, the resources of the project would be utilized better and as a result, the maintenance phase of the project will be easier for both the customers and the project owners. Software fault prediction decreases the total cost of the project and increases the overall project success rate. The perfect prediction of where faults are likely to occur in code can help direct test effort, reduce costs and improve the quality of software. This Paper shows specific methods of fault prediction for software safety that directly address the root causes of software Faults and improve the quality of software. Keywords - Clustering, Hyper-Quad Tree, K-Means clustering, Quad Tree, Software Fault Prediction. ---------------------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 11, December, 2012 Date of Publication: 25, December 2012 ---------------------------------------------------------------------------------------------------------------------------------------------------- I INTRODUCTION  The software life cycle methodologies might not be followed very well. The main objective of this paper is to predict the fault that  Improper and incomplete testing of software. tends to occur while classifying the dataset also tries to Such faulty software classes may increase development & improve the quality of software. Developing a defect free maintenance cost, due to software failures and decrease software system is very difficult task and sometimes there customer’s satisfaction. are some unknown faults or deficiencies found in software The main objective of this paper is to predict the projects where there is a need of applying carefully the fault that tends to occur while classifying the dataset. principles of the software development methodologies. By spending more time on the defective modules and no time  Hyper Quad-Trees are applied for finding initial on the non-defective ones, the resources of the project cluster centers for K-Means algorithm. would be utilized better and as a result, the maintenance  The overall error rates of this prediction approach are phase of the project will be easier for both the customers compared to other existing algorithms and are found and the project owners. to be better in most of the cases. When we look at the publications about Fault prediction we saw that in early studies static code features were used II .RELATED WORK more. But afterwards, it was understood that beside the Previous work of faulty software components enables effect of static code metrics on Fault prediction, other verification experts to concentrate their time and resources on the problem areas of the software system under development. One of measures like process metrics are also effective and should the main purposes of these models is to assist in software be investigated. For example, Fenton and Neil (1999) argue maintenance budgeting. that static code measures alone are not able to predict Among various clustering techniques available in software Faults accurately. literature K-means clustering approach is most widely being used? To support this idea if software is Faulty this might be Different authors apply different clustering techniques and expert- related to one of the following: based approach for software fault prediction problem. They  The specification of the project may be wrong due to applied K-Means[8][9] and Neural-Gas techniques on different differing requirements or missing features. real data sets and then an expert explored the representative module of the cluster and several statistical data in order to label  Because of improper documentation realization of the each cluster as faulty or non faulty. And based on their experience project is too complex. Neural-Gas-based prediction approach performed slightly worse  Scarce and incorrect requirements results in poor than K-Means clustering-based approach in terms of the overall design. error rate on large data sets. But their approach is dependent on  Developers are not qualified enough for the project. the availability and capability of the expert. Seliya and www.theijes.com The IJES Page 239
  • 2. Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure Khoshgoftaar proposed a constrained based semi-supervised 3.1.1. The algorithm is composed of the following steps clustering scheme. They showed that this approach helped the expert in making better estimations as compared to predictions 1. Place K points into the space represented by the made by an unsupervised learning algorithm. [1] a Quad Tree- objects that are being clustered. These points based K-Means algorithm has been applied for predicting faults in represent initial group centroids. program modules. The aim of their topic is twofold. First, Quad- 2. Assign each object to the group that has the closest Trees are applied for finding the initial cluster centers to be input to the K-Means Algorithm. Bhattacherjee and Bishnu [1] have centroid. applied unsupervised learning approach for fault prediction in 3. When all objects have been assigned, recalculate the software module. An input threshold parameter delta governs the positions of the K centroids. number of initial cluster centers and by varying delta the user can generate desired initial cluster centers. The clusters obtained by 4. Repeat Steps 2 and 3 until the centroids no longer Quad Tree-based algorithm were found to have maximum gain move. This produces a separation of the objects into values. Second, the Quad-tree based algorithm is applied for groups from which the metric to be minimized can predicting faults in program modules. Supervised techniques have be calculated however been applied for software fault prediction and software effort prediction There is no solution to find the optimal number 3.1.2. Limitations of K-Means of clusters for any given data set in K-Means. The overall error The cluster centers, thus found, serve as input to the rates of this prediction approach are compared to other existing clustering algorithms. However, it has some inherent algorithms and are found to be better in most of the cases. In this drawbacks- paper I try to find the better centroid than Quad-tree algorithm by  The user has to initialize the number of clusters which using Hyper Quad-tree which will give as a input to the K-Means is very difficult to identify in most of the cases. algorithm for lowers the error rate and effective software fault prediction. Due to some defective software modules, the  It requires selection of the suitable initial cluster maintenance phase of software projects could become really centers which is again subject to error. Since the painful for the users and costly for the enterprises. That is why structure of the clusters depends on the initial predicting the defective modules or files in a software system cluster centers this may result in an inefficient prior to project deployment is a very crucial activity, since it leads clustering. to a decrease in the total cost of the project and an increase in  The K-Means algorithm is very sensitive to noise. overall project success rate . 3.2. Quad Tree III . OVERVIEW This data structure was named a Quad tree by Raphael This paper shows the study of K-Means clustering Finkel and J.L. Bentley in 1974. A similar partitioning is algorithm Quad tree algorithm and Hyper Quad-tree also known as a Q-tree. The Quad Tree-based method algorithm then proposed system architecture for software assigns the appropriate initial cluster centers and eliminates fault prediction, expected result using confusion matrix and the outliers hence overcoming the second and third conclusion. drawback of K-Means clustering algorithm. 3.1 K-Means clustering algorithm Common features of quad tree K-means (MacQueen, 1967) is one of the simplest  They decompose space into adaptable cells. unsupervised learning algorithms that solve the well known  Each cell (or bucket) has a maximum capacity. clustering problem. The procedure follows a simple and  When maximum capacity is reached, the bucket splits. easy way to classify a given data set through a certain  The tree directory follows the spatial decomposition of number of clusters (assume k clusters) fixed a priori. The the Quad tree. main idea is to define k centroids, one for each cluster. Figure1. Shows the simple Quad tree representation. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group page is done. At this point we need to re- calculate k new centroids as bar centers of the clusters resulting from the previous step. After we have these k new Figure1. Simple Quad Tree. centroids, a new binding has to be done between the same 3.2.1. Some definitions of notations and parameters data set points and the nearest new centroid. A loop has  Minimum: User defined threshold for minimum been generated. As a result of this loop we may notice that number of data points in a sub bucket. the k centroids change their location step by step until no  Maximum: User defined threshold for maximum more changes are done. number of data points in a sub bucket.  White leaf bucket: A sub bucket having less than MIN number of data points of the parent bucket. Fig. shows an illustration of a white leaf bucket. www.theijes.com The IJES Page 240
  • 3. Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure  Black leaf bucket: A sub bucket having more than Hyper Quad-Trees are expected to give better cluster MAX number of data points of the parent bucket. centers than the Quad-tree because  Gray bucket: a sub bucket which is neither white  It has an eight-way branching tree whose nodes are nor black. associated with axis- parallel boxes.  User specified distance for finding nearest  The d-dimensional analogue is known variously as a neighbors. multidimensional Quad-tree and a hyper quad tree.  It divides the regions recursively so that no region 3.2.2. Quad Tree Algorithm [8] [9]: contains more than one data point.  For each class: Algorithm to generate a hyper quad tree is as follows  Find the minimum and maximum x and y co- 1. Select Random Node ordinates. 2. Initialize current node = root node  Find the midpoint using the values obtained in the 3. While current node is not a leaf node do previous step. 4. Generate a random number n  Divide the spatial area into four sub regions based 5. If n<p then //n= Number of data points on the midpoint. 6. Break the while loop  Plot the points and classify regions as white leaf //P=Random path termination probability buckets or black leaf buckets. 7. Else  The white leaf buckets are left as such. 8. Randomly select one children  The Center data-points of each black leaf bucket are 9. current node=selected node calculated for all black leaf buckets. 10. Require: Hyper Quad Tree  The mean of all the center points obtained in the 11. end if previous step is calculated. 12. end while  The computed mean gives the centroid point 13. select the current node necessary for that class. Input: Dimension, Data set, Min, Max 3.2.3. Limitations of Quad Tree Output: Centroid  The user has to initialize the number of clusters which is very difficult to identify using quad tree IV . THE PROPOSED SYSTEMS algorithm. The proposed system is ―Software fault prediction using  It is not providing the exact centroid. clustering approach‖ that classify given data using Hyper Quad-tree algorithm. 3.3Hyper Quad Tree The system consists of 3 modules The Hyper Quad Tree-based method assigns the appropriate  Create dataset parser initial cluster centers and eliminates the outliers hence  Data set is given as input to the Hyper Quad-tree overcoming the second and third drawback of K-Means algorithm in which we Create cells, insert cell, algorithm that is label bucket, split cell, spatial decomposition  Hyper Quad-Trees are applied for finding initial Input: Dimension, Data set cluster centers for K-Means algorithm. User can Output: Centroid generate desired number of cluster centers that can  Centroid points obtained using the Hyper Quad-tree be used as input to the simple K-Means. is given as an input to the K-Means to get better  Second, the centroid obtained by the Hyper Quad clusters it Calculates the distance, Shuffle data Tree is more accurate than Quad tree. points according to distance, If centroids are stable Figure2 shows a simple hyper quad tree representation of then stop. The output of this will be set of clusters data set dots denotes the data:  Measure the Faults in terms of FPR, FNR and ERROR using confusion matrix. Figure 3. System Architecture Figure2. Hyper Quad Tree [7] As Shown in TABLE1. The Actual labels of data items are placed along the rows, while the predicted labels www.theijes.com The IJES Page 241
  • 4. Overview of Software Fault Prediction using Clustering Approaches and Tree Data Structure are placed along the columns. For example, a False Actual reducing NOI. Better throughput with lower error rates of label implies that the module is not faulty. If a not faulty classification. module (Actual label—False) is predicted as non-faulty In this paper I am not focusing on automatic (Predicted Label—False) then there is the condition of cell initialization of number of clusters this will be the future A, which is True Negative, and if it is predicted as faulty work for better software fault prediction using clustering (Predicted label—True) then there is the condition of cell Approach. B, which is False Positive. Similar definitions hold for False Negative and True Positive. The False positive rate is ACKNOWLEDGMENT the percentage of not faulty modules labeled as faulty by the This is a small review of my post graduate project model and the false negative rate is the percentage of faulty work that I am going to start to implement .I specially modules labeled as fault free and Error is the percentage of thank to my Guide for his assistance. mislabeled modules. The following equations are used to calculate these FPR, FNR, Error, and Precision [1] REFERENCES [1] P.S. Bishnu and V. Bhattacherjee, Member, IEEE‖ Software Fault B (1) Prediction Using Quad Tree-Based K-Means Clustering Algorithm‖ FPR = IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 6, June 2012 A+B [2] P.S. Bishnu and V. Bhattacherjee, ―Outlier Detection Technique Using Quad Tree,‖ Proc Int’l Conf. Computer Comm. Control and C (2) Information Technology, pp. 143-148, Feb. 2009. [3] P.S. Bishnu and V. Bhattacherjee, ―Application of K-Medoids with kd- FNR = Tree for Software Fault Prediction,‖ ACM Software Eng. Notes, vol. D+C 36, pp. 1-6, Mar. 2011. [4] V. Bhattacherjee and P.S. Bishnu, ―Software Fault Prediction Using KMedoids Algorithm,‖ Proc. Int’l Conf. Productivity, Quality, B+C Reliability, Optimization and Modeling (ICPQROM ’11), p. 191, ERROR = (3) Feb. 2011. A+B+C+D [5] J. Han and M. Kamber, ―Data Mining Concepts and Techniques‖, second ed, pp. 401-404. Morgan Kaufmann Publishers, 2007. [6] Parvinder S. Sandhu, Jagdeep Singh, Vikas Gupta, Mandeep Kaur, TABLE 1.Confusion Matrix Sonia Manhas, Ramandeep Sidhu‖ A K-Means Based Clustering Predicted Labels Approach for Finding Faulty Modules in Open Source Software False True Systems‖ ,World Academy of Science, Engineering and Technology 48 2010 Actual Labels (Non-Faulty) (Faulty) [7] Michael Laszlo and Sumitra Mukherjee, Member, IEEE, ―A Genetic False True Negative False Positive Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means (Non-Faulty) A B Clustering‖, IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 4, april 2006 True False Negative True Positive [8] Leela Rani.P, Rajalakshmi.P,‖ Clustering Gene Expression Data using (Faulty) C D Quad-tree based Expectation Maximization Approach‖ International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA,Volume 2– The above performance indicators should be No.2, June 2012 – www.ijais.org minimized. A high value of FPR would lead to wasted [9] Meenakshi PC, Meenu S, Mithra M, Leela Rani.P,‖ Fault Prediction testing effort while high FNR value means error prone using Quad-tree and Expectation Maximization Algorithm‖, modules will escape testing. International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA In this paper, for calculating the measures, if any Volume 2– No.4, May 2012 – www.ijais.org metric value of the centroid data point of a cluster was [10] P.S. Bishnu and V. Bhattacharjee, ―A New Initialization Method for greater than the threshold, that cluster was labeled as faulty KMeans Algorithm Using Quad Tree,‖ Proc. Nat’l Conf. Methods and Models in Computing (NCM2C), pp. 73-81, 2008. and otherwise it was labeled as non-faulty. After this the predicted fault labels will compare with the actual fault labels. Also the clusters can be label according to the Biographies and Photographs majority of its members (by comparing with metrics Prof.M.D.Ingle is currently working as an Associate thresholds) but this increases the complexity of the labeling Professor in Jayawantrao Sawant College of Engineering procedure since all the modules in the cluster need to be Hadapsar, Pune, India. His research interests are examined. Networking, Information Security, Mobile Computing, and Data Mining etc. Swati M. Varade Pursuing M.E.(Computer) From CONCLISION AND FUTURE SCOPE Jayawantrao Sawant College of Engineering Hadapsar, Hyper Quad-tree based K-Means clustering Pune, India currently working as a Lecturer in same algorithm evaluates the effectiveness in predicting faulty Institute. Her research interests are Software Testing, Data software modules as compared to the original Quad-tree Mining etc. based K-Means algorithm. Also it will find better the initial cluster centers for K-Means algorithm. By using hyper quad I try to meet the convergence criterion faster and hence it results in lesser number of iterations. Also there will be reduction in time and computational complexity by www.theijes.com The IJES Page 242