SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101       ISSN: 2249-6645

     Enhanced K-Mean Algorithm to Improve Decision Support System
                     Under Uncertain Situations
          Ahmed Bahgat El Seddawy 1, Prof. Dr. Turky Sultan2, Dr. Ayman Khedr 3
              1
                  Department of Business Information System, Arab Academy for Science and Technology, Egypt
                               2, 3
                                    Department of Information System, Helwan University, Egypt

Abstract: Decision Support System (DSS) is equivalent            of usage investment - direct or indirect - or credit and any
synonym as management information systems (MIS). Most            sector of investment will be highly or moderate or low risk.
of imported data are being used in solutions like data           And select which one of this sectors risk „rejected‟ or un-
mining (DM). Decision supporting systems include also            risk „accepted‟ all that under uncertain environments such
decisions made upon individual data from external sources,       as; political, economical, marketing, operational, internal
management feeling, and various other data sources not           policies and natural crises, all that using the contribution of
included in business intelligence. Successfully supporting       this study enhancing k-mean algorithm to improve the
managerial decision-making is critically dependent upon          results and comparing results between original algorithm
the availability of integrated, high quality information         and enhanced algorithm.
organized and presented in a timely and easily understood                  The paper is divided into six sections; section two
manner. Data mining have emerged to meet this need. They         is a background and related work it is divided into two
serve as an integrated repository for internal and external      parts, part one is about DSS, part two is about DM including
data-intelligence critical to understanding and evaluating       K-m algorithm and enhancing in K-m algorithm. Section
the business within its environmental context. With the          three presents the proposed Investing Data Mining System
addition of models, analytic tools, and user interfaces, they    „IDMS. Section four presents IDMS experiments,
have the potential to provide actionable information that        implementations and results using original and enhanced k-
supports effective problem and opportunity identification,       m algorithm. Section five presents conclusion and finally
critical decision-making, and strategy formulation,              section six presents future work.
implementation, and evaluation. The proposed system
Investment Data Mining System (IDMS) will support top                   II.      Background and related work
level management to make a good decision in any time                       2.1 Decision Support System (DSS)
under any uncertain environment and on another hand                        DSS includes a body of knowledge that describes
using enhancing K-mean algorithm.                                some aspects of the decision maker's world that specifies
                                                                 how to accomplish various tasks, that indicates what
Keywords: dss, dm, mis, clustering, classification,              conclusions are valid in different circumstances [4].The
association rule, k-mean, olap, matlab                           expected benefits of DSS that discovered are higher
                                                                 decision quality, improved communication, cost reduction,
                    I.    Introduction                           increased productivity, time savings, improved customer
          Decision Support System (DSS) is equivalent            satisfaction and improved employee satisfaction. DSS is a
synonym as management information systems (MIS). Most            computer-based system consisting of three main interacting
of imported data are being used in solutions like data           components:
mining (DM). Decision supporting systems include also
decisions made upon individual data from external sources,          A language system: a mechanism to provide
management feeling, and various other data sources not               communication between the user and other components
included in business intelligence. Successfully supporting           of the DSS.
managerial decision-making is critically dependent upon the         A knowledge system: A repository of problem domain
availability of integrated, high quality information                 knowledge embodied in DSS as either data or
organized and presented in a timely and easily understood            procedures.
manner. Data mining have emerged to meet this need. They            A problem processing system: a link between the
serve as an integrated repository for internal and external          other two components, containing one or more of the
data-intelligence critical to understanding and evaluating the       general problem manipulation capabilities required for
business within its environmental context. With the addition         decision-making.
of models, analytic tools, and user interfaces, they have the
potential to provide actionable information that supports
effective problem and opportunity identification, critical                                                       Languag
                                                                        Knowl               Problem
decision-making, and strategy formulation, implementation,                                  Processi             e System
                                                                         edge
and evaluation. The proposed system will support top level              System                 ng
management to make a good decision in any time under any                                    System
uncertain environment [4]. This study aim to investigate the
adoption process of decision making under uncertain
situations or highly risk environments effecting in decision
of investing stoke cash of bank. This applied for two types

                                                   www.ijmer.com                                                   4094 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645
               Fig 1: DSS Main Components
After surveying multiple decision support systems, it is
concluded that decision support systems are categorized into
the following [5]:

    File drawer systems: This category of DSS provides
     access to data items.
 Data analysis systems: Those support the manipulation
     of data by computerized tools tailored to a specific task
     or by more general tools and operators.
 Analytical information systems: Those provide access                            Fig 2: DM Techniques [5]
     to a series of decision-oriented databases.
 Accounting and financial models: those calculate the           2.2.1 Clustering Technique
     consequences of possible actions.                                     Clustering can be considered the most important
 Representational models: those estimate the                    learning problem; like every other problem of this kind, it
     consequences of actions based on simulation models          deals with finding a structure in a collection of unlabeled
     that include relationships that are causal as well as       data. A loose definition of clustering could be “the process
     accounting definitions.                                     of organizing objects into groups whose members are
 Optimization models: those provide guidelines for              similar in some way”.
     actions by generating an optimal solution consistent                  A cluster is therefore a collection of objects which
     with a series of constraints.                               are “similar” between them while are “dissimilar” to the
 Suggestion models: those perform the logical                   objects belonging to other clusters as shown in figure 3. [6]
     processing leading to a specific suggested decision or a    Define clustering by saying “Clustering involves identifying
     fairly structured or well understood task.                  a finite set of categories or segments „clusters‟ to describe
          This section describes the approaches and              the data according to a certain metric. The clusters can be
techniques mostly used when developing data warehousing          mutually exclusive, hierarchical or overlapping”. [6]
systems that data warehousing approaches such as; Online         Defines clustering as follows: “clustering enables to find
Analytical Processing „OLAP‟, Data Mining „DM‟ and               specific discriminative factors or attributes for the studied
Artificial Intelligence „AI‟. And in this paper will using DM    data. Each member of a cluster should be very similar to
as approach and technique.                                       other members in its cluster and very dissimilar to other
                                                                 clusters. When a new data is introduced, it is classified into
         2.2 Data Mining Techniques (DM)                         the most similar clusters. Techniques for creating clusters
         Data mining is the process of analyzing data from       include partitioning methods as in k-means algorithm, and
different perspectives and summarizing it into useful            hierarchical methods as decision trees, and density-based
information [10]. DM techniques are the result of a long         methods”.
process of research and product development [10]. There
are several processes for applying DM:

1.   Definition of the business objective and expected
     operational environment.
2.   Data selection is required to identify meaningful
     sample of data.
3.   Data transformation that involves data representation in
     an appropriate format for mining algorithm.
4.   Selection and implementation of data mining algorithm
     depends on the mining objective.
5.   Analysis of the discovered outcomes is needed to                   Fig 3: Simple graphical for clustering data [6]
     formulate business outcomes.
6.   Representing valuable business outcomes.                             The goal of clustering is [6] to determine the basic
                                                                 grouping in a set of unlabeled data, to find representatives
         DM techniques usually fall into two categories,         for homogeneous groups which are called “data reduction”,
predictive or descriptive. Predictive DM uses historical data    to find „natural clusters‟ and describe their unknown
to infer something about future events. Predictive mining        properties these clusters are called „natural‟ data types, to
tasks use data to build a model to make predictions on           find useful and suitable groupings this can be called „useful‟
unseen future events. Descriptive DM aims to find patterns       data classes‟ and to find unusual data objects outlier
in the data that provide some information about internal         detection. There are several advantages for the clustering
hidden relationships. Descriptive mining tasks characterize      technique use it. Among these advantages are recognizing
the general properties of the data and represent it in a         the number of clusters, grouping similar members together,
meaningful way. Figure2 shows the classification of DM           identifying the discriminate attributes, ranking the
techniques.                                                      discriminate attributes, and recognizing the discriminate

                                                   www.ijmer.com                                                   4095 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645
attributes of one cluster and representing them within the                Before looking to the enhancements that are
business context. A major disadvantage of clustering is that     applied on K-mean algorithm, one should give the
it is suitable for static data but not for dynamic data where    impression of pros, cons and problems of k-mean algorithm,
its value changes over time or due to any other factor.          then presents the enhanced k-mean algorithm and presents a
Another disadvantage is that sometimes the generated             way for evaluating this enhancement. There are several
clusters may not have a practical meaning. Finally, it is        strengths and weakness shown in table 5.1 (Arthur, 2006).
possible not to spot the cluster sometimes as there is no
exact idea of what to look for or there is a lack of collected        Table 1: Strengths and weakness of K-men algorithm
data [7].                                                                                 (Arthur, 2006)
                                                                           Strengths                      Weakness
          2.2.1.1 K-Mean Algorithm                                     1. Vectors can                 1. Quality of the
          [8] Defines K-means as one of the simplest                       flexibly change               output depends on
unsupervised learning algorithms that solve the well known                 clusters during the           the initial point.
clustering problem. Build to classify or grouping objects                  process.                   2. Global optimum
based on features into „k‟ number of group. K is positive              2. Always converges               solution not
integer number and the grouping is done by mining the sum                  to a local optimum            guaranteed.
of squares of distance between data and the corresponding              3. Quite fast for most
cluster centriod. The cluster centroid is the average point in             application
the multidimensional space defined by the dimensions [9].
There are a lot of applications of the K-mean clustering,        There are some problems in k-mean algorithm such as
range from unsupervised learning of neural network, Pattern      (Arthur, 2006):
recognitions, Classification analysis, Artificial intelligent,
image processing, machine vision, etc. In principle, we have     1.   Non globular clusters (overlapping in data between
several objects and each object have several attributes and           clusters)
we want to classify the objects based on the attributes, then    2.   Assume wrong number of clusters.
we can apply this algorithm. There are commonly four steps       3.   Find empty clusters.
followed for K-mean idea [10];                                   4.   Bad initialization to centriod point
 1. Place K points into the space represented by the             5.   Choosing the number of clusters
      objects that are being clustered. These points represent
      initial group centers.                                     The most common measure to evaluate k-men algorithm is
 2. Assign each object to the group that has the closest         Sum of Squared Error „SSE‟ with another words called,
      centered.                                                  Sum of Squared Distances „SSD‟ (Wenyan Li, 2009).
 3. When all objects have been assigned, recalculate the
      positions of the K centered.                                   For each point, the error is the distance to nearest
 4. Repeat Steps 2 and 3 until the centered no longer                 cluster.
      change. This produces a separation of the objects into         To get SSE or SSD, we square these error and sum
      groups from which the metric to be minimized can be             them
      calculated as follows.                                          

               k     nj

             
                                             2
                            xij  c j                                            x is a data point in cluster Ci .
              j 1 i 1                                                   mi is the representative point for cluster Ci.
         K: number of clusters
         nj: number of points in jth cluster                         There is one way to reduce SSE, is to increase K
         xij: ith point in jth cluster                                „Number of clusters‟.
                                                                  The good clustering with smaller K can have slower
                                                                      SSE than a poor clustering with higher K.
                                                                 After investigating the most common algorithms for data
                                                                 clustering, there is enhancement for the most familiar
                                                                 algorithm which called k-means trying to solving some
                                                                 problems of k-mean algorithm. More features will be added
                                                                 to enhanced algorithm such as; change centriod point from
                                                                 random points to center points for data and adding step to
                                                                 avoid empty clusters when visualize data it shows also in
                                                                 steps of algorithm. Figure 5.31 shows the flowchart of
                                                                 enhanced K-men algorithm.
                                                                      1. Assume number of cluster “K”.
     Fig 4: Procedures code for k-means algorithm [10]
                                                                      2. Calculate „K‟ at center points of data set
         2.2.1.2 Enhanced K-means ‘E_KM’
                                                   www.ijmer.com                                                    4096 | Page
International Journal of Modern Engineering Research (IJMER)
                 www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645
          𝐻 → 𝑌 →Rows
           𝑊 → 𝑋 → 𝐶𝑜𝑙𝑢𝑚𝑛𝑠                                                    III. IDMS Implementation
         Mid P = (X,Y)                                                  Investment Data Mining System „IDMS‟ aims to
                𝑊 𝐻                                           build a data mining system for investment in the banking
         P = ( , ) % for every cluster
                2 2
                                                              sector. IDMS consists of several components; data
         Ex,
         H= 400, W= 30                                        gathering, preparing data to discover knowledge, data
                    30 400                                    preprocessing, using data mining techniques in sequences
         Mid P = ( 2 , 2 )      P = (15, 200) ← Center        steps start with classification data, clustering data especially
         point for Cluster                                    using K-mean algorithm and enhanced K-mean algorithm to
      Fig 6: Pseudo code of calculating „k‟ center point      set which best result and then set and run association rules
                                                              to solve problem, post processing and finally get result and
3.   Calculate the distance between a data sample and         visualize result to create best decision to take a good
     clusters.                                                decision for investment under uncertain situations [18].
4.   Assign a data sample to closest cluster center.
5.   Calculate new cluster center.                            3.1 Experiments
6.   Repeat step 3.4 and 5 until no objects move group.
7.   Avoid empty clusters                                              3.1.1 Data Collected
                                                                       The following data is taken from a bank is located
                                                              in Egypt and has several branches. The bank serves more
           Loop through each cluster                          than 100000 customers per year, contracting with more than
                 If all items inside cluster equal 0 ,        50000 organizations. There are some basics found through
           delete cluster                                     interviews such as; explain overall process in bank for usage
                    Else                                      and resources of cash in bank. IDMS will focus in depth of
                 do nothing;                                  investment department. All customers and departments
          Fig 7: Pseudo code of avoiding empty clusters       data‟s is stored electronically using SQL database. The
                                                              database stores values in six fields: Customer name,
     8.    Perform visualization                              Customer number or ID, Previous commitments, Paperwork,
                                                              Type of investment, Sector of the field investment and
                                                              Previous debt, shown in table 2. The received data is in the
                        Start
                                                              form of SQL database converted to excel sheets.

             Assume Number of cluster
                      “K”


             Start with centers points of
                       data set


               Calculate the distance
              between data sample &
                       cluster

                Assign data sample to
                closest cluster center


             Calculate new cluster center


                                            No
                    No objects
                    move group
                                     Yes

                Avoid empty clusters



                 Perform Visualization


                        End

Fig 8: E_K-mean algorithm flowcharts

                                                   www.ijmer.com                                                 4097 | Page
International Journal of Modern Engineering Research (IJMER)
                         www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101       ISSN: 2249-6645
                                                                                               Table 2: Collected Data
                    Request
Request                                         Customer                         Customer                                                         Type of                              Sector of                   Previous
                     Serial                                                                                    Paperwork
 Date                                            Name                               ID                                                          investment                            investment                     debt
                    Number
1-11-2010                 1                      Ahmed                                 20123                       Ok                                  Industry                    Durable goods                         No

5-12-2010                 2                           Co.                              20123                       Ok                              Securities                      Durable goods                         No

                                                                                                                                                                                      Cultivation
1-11-2010                 3                      Samed                                 20520                       Ok                              Securities                                                            No
                                                                                                                                                                                       strategies
                                                                                                                                                                                      Import and
1-11-2010                 4                      Saaed                                 20002                       Ok                                  Industry                                                          No
                                                                                                                                                                                         export
                                                                                                                                                                                      Cultivation
1-11-2010                 5                           Co.                              20135                       Ok                           Agriculture                                                              No
                                                                                                                                                                                       strategies

1-11-2010                 6                     Mohamed                                20123                       Ok                                  Industry                        Foodstuffs                        No

   …                     ….                           …                                 ….                         ….                                    ….                                     ….                       ….


                                                                                Table 3: Format for each field for data
                                                                                                             2010       20520                                                        03                      032
                                                                                Sector of the
                                                                                 investment


                                                                                 investment
                                                               Paperwork
              Customer


                                           Customer




                                                                                  Previous
              Number
   Request

              Request




                                                                                   Type of
               Name
               Serial




                                                                                                                                  2010                  73002                        01                      011
    Date




                                                                                    field


                                                                                    debt
                                              ID




                                                                                                                                  2011                  20135                        03                      031
                                                                                                                                  2012                  20033                        01                      011
                                                                                                                                  ….                     ….                         ….                       ….
               Numb




                                           Numb
    Date




                              Text




                                                               Text


                                                                                Text


                                                                                                Text


                                                                                                            Text
                er




                                            er




                                                                                                                              IDMS execution done via several techniques
                                                                                                                   started with clustering technique using K-M and enhanced
  3.2 Preprocessing                                                                                                k-mean algorithm, classification technique using ID3
            The data collected undergoes four preprocessing                                                        algorithm and association rules technique using apriori
  steps and the data matrix is reduced from 8 columns to 7                                                         algorithm. Next section will discuss the results of execution
  columns. The first step converts data from textual values to                                                     for fist technique clustering.
  numeric ones in order to deal with identification numbers
  as in table                                                                                                                                          IV. Clustering Results
                                                                                                                   4.1 Clustering sectors
  Table 4: Example of data after converting to numeric sheet                                                                First: Original K-mean Algorithm results
                                                                                                                            Table 6 presents the set of data used in the
                                                                   investment


                                                                                  investment
                                                  Paperwork
                                     Customer




                                                                                   Sector of

                                                                                                 Previous
               Number
     Request

               Request




                                                                     Type of




                                                                                                                   implementation experiments for training data to set a best
                Serial
      Date




                                                                                                   debt
                                        ID




                                                                                                                   result and choosing an effective number of clusters and
                                                                                                                   percentage of data set to apply the technique for the system.
                                                                                                                   These data are divided among 10 clusters representing the
      201                        2012                                                                              different percentage of the set of data used using K-Mean
                     1                            1                    03          032                 1
       0                          3                                                                                algorithm.
      201                        2012
                     2                            1                    03          032                 1
       0                          3                                                                                 Table 6: Testing data on a 10 clusters model using K-M
      …             ….            ….             ….                  ….                ….          ….                                      Algorithm
                                                                                                                                                           C               C
                                                                                                                         C     C            C                  C
  In the second step, the interesting selected attributes are                                                      C0                C3            C5                C8
                                                                                                                          1    2             4                 7
  Request data, Customer ID, Type of Investment and Sector                                                                                                 6               9
  of investment as shown in table 5. Elements with values
                                                                                                                                                                                  Petrochemic




                                                                                                                                                                                                               Technologie
                                                                                                                    Agriculture




                                                                                                                                                          Securities
                                                                                                                                             Tourism




                                                                                                                                                                       Industry
                                                                                                                                   Trading




  indicate that the type and sector of investment.
                                                                                                                                                                                                 Non
                                                                                                                                                                                                       Non




                                                                                                                                                                                                                             Non
                                                                                                                                                                                       als




                                                                                                                                                                                                                    s




     Table 5: Example of data after selecting main segments
           Request        Customer                 Type of                       Sector of
            Date             ID                  investment                     investment
             2010             20123                           03                       032
                                                                                            www.ijmer.com                                                                                                    4098 | Page
International Journal of Modern Engineering Research (IJMER)
                        www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645

                                                                                                                              30.00%




                                                                                         7.09 %
                                               8.16%
 23 %

        19 %

                       17 %




                                                                14%
                                                                                                                              25.00%



                                   3%




                                                                                                                    0%
                                                                             0%



                                                                                                         9%
                                                                                                                              20.00%
                                                                                                                              15.00%
                                                                                                                              10.00%
                                                                                                                               5.00%
                                                                                                                               0.00%
 25




                                                                                                                                                                                             Industry
                                                                                                                                                          Trading




                                                                                                                                                                                Securities
                                                                                                                                                                      Tourism




                                                                                                                                                                                                          Petrochemicals
                                                                                                                                            Agriculture




                                                                                                                                                                                                                             Technologies
 20

 15

 10                                                                                                                                         C0            C1         C2         C3           C4          C5                 C6

    5                                                                                                                    Fig 11: Distribution percentages of sectors in testing set of
                                                                                                                                         data for a 10 cluster model
    0
        C0 C1 C2 C3 C4 C5 C6 C7 C8 C9                                                                                             These graph present clustering of sectors are used
                                                                                                                         in investment sector to find a good usage for cash in bank.
                                                                                                                         The results which are appearing from this algorithm
Fig 10 : Distribution percentages of sectors in testing set of                                                           describe the distribution percentage of sectors in testing set
                 data for a 10 cluster model                                                                             of data for 7 clusters by E_K-M, this enhance consider
                                                                                                                         effect on data results on IDMS.
          These graph present clustering of sectors are used
in investment sector to find a good usage for cash in bank.                                                              4.2 Clustering risks based on sectors using E_K-M
The results which are appearing from this algorithm                                                                                Table 8 presents the set of data used in the
describe the distribution percentage of sectors in testing set                                                           implementation experiments for training data to set a best
of data for 10 clusters by K-M, this enhance consider effect                                                             result and choosing an effective number of clusters and
on data results on IDMS.                                                                                                 percentage of data set to apply the technique for the system.
                                                                                                                         These data are divided among 3 clusters representing the
Second: Enhanced K-mean Algorithm results                                                                                different percentage of the set of data used and how
         After applying enhanced k-mean algorithm to                                                                     distribute risk of sectors after clustering them to seven
avoid empty clusters and re clustering data more accurate                                                                clusters using in IDMS.
using change center point for „K‟, it gives the research
more accurate results appear in next section. Table 7                                                                     Table 8: Testing data on a 3 cluster model using E_K-M
presents the set of data used in the implementation                                                                                              Algorithm
experiments for training data to set a best result and
choosing an effective number of clusters and percentage of
data set to apply the technique for the system. These data                                                                         C0                                 C1                                   C2
are divided among 7 clusters representing the different                                                                          Low Risk                           Mid Risk                            High Risk
percentage of the set of data used using enhanced K-Mean
algorithm E_K-M.                                                                                                              Agriculture (1)                  Trading (4)
                                                                                                                                                                                                   Tourism (5)
                                                                                                                              Petrochemicals                  Technologies
                                                                                                                                                                                                  Securities (3)
 Table 7: Testing data on a 7 clusters model using E_K-M                                                                            (6)                            (7)
                        Algorithm                                                                                                                              Industry (2)
                                                                                                                                  48.57%                         40.08%                                  11.35%
        C0             C1          C2           C3              C4            C5                  C6
                                                                             Petrochemica



                                                                                                     Technologies




                                                                                                                                       7
         Agriculture




                                                   Securities
                                     Tourism




                                                                  Industry
                         Trading




                                                                                   ls




                                                                                                                                       5
                                                                                                                             Sectors
                                                                                                                             Number    3

                                                                                                                                       1
        28.57%




                                                                                                  10.20%
                        4.08%

                                    8.16%

                                                  2.04%

                                                                 8.16%

                                                                                 6.12%




                                                                                                                                       -1
                                                                                                                                 Low Risk C0                        High Risk C1                         Mid Risk C2
                                                                                                                           Fig 12: Distribution sectors numbers based on risk level

                                                                                                  www.ijmer.com                                                                                                            4099 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645
Figure 12 represent distribution of sectors number based on          amount of overlapping elevates and the level of accuracy
risk level. Every level of risk has colure such as, red color        declines. Second the efficiency is shown in E_K-M and
for high risk to invest in this sector, green color for medium       original K-M in two aspects. Time efficiency in original K-
level of risk to invest in this sector and blue color for low        M is less than time efficiency in E_K-M because of adding
risk to invest in this sector.                                       two steps of enhancement, so time will increase and it may
                                                                     be accepted when raise the quality of results. Size
4.3 Result of Comparison between K-M and E_K-M                       efficiency in original K-M is less than space of saving data
          There are some similarities and differences                efficiency in E_K-M because there are steps added to K-M
between algorithms and their enhancement which describe              to avoid empty clusters. On the other hand, using the
the utility, efficiency, applicability, accuracy, popularity,        enhanced K-mean algorithm decreases the size of the data
flexibility and visualization of the two algorithms. A               set and increases the number of clusters. Accordingly, the
comparison between the two algorithms is shown in the                amount of overlaps decreases and accuracy increases.
following table 9.                                                   Generally, the enhanced K-mean algorithm showed a better
          Based on comparison table finding that, first              performance compared to traditional K-mean algorithm.
traditional K-mean algorithm as the size of the data set
decreases and the number of clustering increases, the

                                  Table 9: Comparison between KM and E_K-M algorithms
         Approach                              K-Mean Algorithm                                 E_K-mean Algorithm
  1. Theory                  K-mean start with assume number of clusters,                E_K-mean start with assume number
                             picking „K‟ randomly, calculate distance and                of clusters, picking „K‟ at center
                             visualize.                                                  point, calculate distance and avoid
                                                                                         empty clusters then visualize.
  2. Efficiency                      Time :        Less efficient                         Time :        More efficient
       (Time and                                                                          Space :       Less Efficient
                                     Space :       More Efficient
  Space)
  3. Applicability                                 Applicable                                          Applicable
  4. Accuracy                                    Less accurate                                       More accurate
  5. Popularity                                      Popular                                         Under Testing
  6. Flexibility                                   Same level                                          Same level
  7. Visualization data                        Use visualization                                   Use visualization
  8. Generalization                               Widely used                                         Limited use
  9. Number of cluster                    Use small and huge numbers                          Use small and huge numbers
  10. Type of data                         Use all type of data source                         Use all type of data source
  source
  11. Size of data                             Huge and small data                                Huge and small data
  source
                                                                            In next step of this study implementing this
                       V. Conclusions                               proposed approach using classification and association
         This paper represents applying clustering technique        techniques to give full image and best result for high level of
by enhancing K-M algorithm for DSS in banking sector                management with a good decision and high accurate results.
especially in investment department under uncertain
situations which has been rarely addressed before. IDMS is                          VI. ACKNOWLEDGMENT
a new proposed system which is simple, straightforward with                  I want to express my deepest gratitude for my
low computation needs. The proposed preprocessing                   professor‟s supervisors Prof. Dr. Turky Sultan and Dr.
component is an aggregation of several known steps. The             Ayman Khedr for their major help and support through all
post processing component is an optional one that eases the         the phases of research and development.
interpretation of the investment results. The banking is
planning a set of actions in accordance of IDMS outcomes                                   REFERENCE
for decision making in investment sector. The investment            [1]   A. Hunter and S. Parsons, "A review of uncertainty
department in the banking is starting to analyze the                      handling formalisms", Applications of Uncertainty
approached investment sector, to introduce a good decision                Formalisms, LNAI 1455, pp.8-37. Springer -Verlag,
under uncertain situation after enhance K-M algorithm to                  1998.
give high accurate and high quality data, that shown in             [2]   E. Hernandez and J. Recasens, "A general framework
comparison table.                                                         for induction of decision trees under uncertainty",
                                                                          Modelling with Words, LNAI 2873, pp.26–43,
Future work                                                               Springer-Verlag, 2003.



                                                   www.ijmer.com                                                        4100 | Page
International Journal of Modern Engineering Research (IJMER)
             www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600       ISSN: 2249-6645
[3]  M. S. Chen, J. Han, and P. S. Yu. IEEE Trans             [13] Thearling K, Exchange Applications White Paper, Inc.
     Knowledge and Data Engineering Data mining. An                increasing customer value by integrating data mining
     overview from a database perspective, 8:866-883, 1996.        and campaign management software, 1998.
[4] U. Fayyad, G. Piatetsky-Shapiro and W. J. Frawley.        [14] Noah Gans, Spring. Service Operations Management,
     AAAI/MIT, Press definition of KDD at KDD96.                   Vol. 5, No. 2, 2003.
     Knowledge Discovery in Databases, 1991.                  [15] Joun Mack. IEEE TRANSACTIONS ON PATTERN
[5] Gartner. Evolution of data mining, Gartner Group               ANALYSIS AND MACHINE INTELLIGENCE. An
     Advanced Technologies and Applications Research               Efficient k-Means Clustering Algorithm, Analysis and
     Note, 2/1/95.                                                 Implementation, VOL. 24, NO. 7, JULY 2002.
[6] International Conferences on Knowledge Discovery in       [16] Andrew Moore and Brian T. Luke. Tutorial Slides, K-
     Databases and Data Mining (KDD‟95-98), 1995-1998.             means and Hierarchical Clustering and K-Means
[7] R.J. Miller and Y. Yang. Association rules over                Clustering, Slide 15, 2003.
     interval data. SIGMOD'97, 452-461, Tucson, Arizona,      [17] E. Turban, J. E. Aronson, T. Liang, and R. Sharda,
     1997.                                                         Decision Support and Business Intelligence Systems,
[8] Zaki, M.J., SPADE An Efficient Algorithm for Mining            eighth edition. Prentice Hall, 2007.
     Frequent Sequences Machine Learning, 42(1) 31-60,        [18] Ahmed El Seddawy, Ayman Khedr, Turky Sultan,
     2001.                                                         “Adapted Framework for Data Mining Technique to
[9] Osmar R. Zaïane. “Principles of Knowledge Discovery            Improve Decision Support System in an Uncertain
     in Databases - Chapter 8 Data Clustering”. & Shantanu         Situation”, International Journal of Data Mining &
     Godbole data mining Data mining Workshop 9th                  Knowledge Management Process (IJDKP) Vol.2, No.2,
     November 2003.                                                Jun 2012.
[10] T.Imielinski and H. Mannila. Communications of
     ACM. A database perspective on knowledge discovery,                                 Ahmed B. El Seddawy, M.S.
     39:58-64, 1996.                                                                     degrees in Information System
[11] BIRCH Zhang, T., Ramakrishnan, R., and Livny, M.                                    from Arab Academy for Science
     SIGMOD '96. BIRCH an efficient data clustering                                      and Technology in 2009. He
     method for very large databases. 1996.                                              now with AAST Egypt Teacher
[12] Pascal Poncelet, Florent Masseglia and Maguelonne                                   Assisting.
     Teisseire (Editors). Information Science Reference.
     Data Mining Patterns New Methods and Applications,
     ISBN 978 1599041629, October 2007.




                                               www.ijmer.com                                                4101 | Page

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Decision support system
Decision support systemDecision support system
Decision support system
 
Introduction to DSS
Introduction to DSSIntroduction to DSS
Introduction to DSS
 
DECISION SUPPORT SYSTEMS
DECISION SUPPORT SYSTEMSDECISION SUPPORT SYSTEMS
DECISION SUPPORT SYSTEMS
 
Unit 1 DSS
Unit 1 DSSUnit 1 DSS
Unit 1 DSS
 
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
 
Types of decision support system
Types of decision support systemTypes of decision support system
Types of decision support system
 
Decision support systems
Decision support systemsDecision support systems
Decision support systems
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Book 2 chapter-14 dss
Book 2 chapter-14 dssBook 2 chapter-14 dss
Book 2 chapter-14 dss
 
Ch03 A decision support system (DSS)
Ch03 A decision support system (DSS)Ch03 A decision support system (DSS)
Ch03 A decision support system (DSS)
 
dss
dssdss
dss
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
DSS and decision support system and its types
DSS and decision support system and its typesDSS and decision support system and its types
DSS and decision support system and its types
 
Introduction to Decision Support System
Introduction to Decision Support System Introduction to Decision Support System
Introduction to Decision Support System
 
Decision support system
Decision support systemDecision support system
Decision support system
 
Decision support system
Decision support systemDecision support system
Decision support system
 
Decision support systems 1
Decision support systems 1Decision support systems 1
Decision support systems 1
 
Decision support system
Decision  support  systemDecision  support  system
Decision support system
 
Decision support systems
Decision support systems Decision support systems
Decision support systems
 

Destaque

Isolation and Identification High-Biological Activity Bacteria in Yogurt Qua...
Isolation and Identification High-Biological Activity Bacteria in  Yogurt Qua...Isolation and Identification High-Biological Activity Bacteria in  Yogurt Qua...
Isolation and Identification High-Biological Activity Bacteria in Yogurt Qua...IJMER
 
Ah32642646
Ah32642646Ah32642646
Ah32642646IJMER
 
Ad32622625
Ad32622625Ad32622625
Ad32622625IJMER
 
User Interactive Color Transformation between Images
User Interactive Color Transformation between ImagesUser Interactive Color Transformation between Images
User Interactive Color Transformation between ImagesIJMER
 
Experimental Investigation and Parametric Studies of Surface Roughness Analy...
Experimental Investigation and Parametric Studies of Surface  Roughness Analy...Experimental Investigation and Parametric Studies of Surface  Roughness Analy...
Experimental Investigation and Parametric Studies of Surface Roughness Analy...IJMER
 
Robust and Radial Image Comparison Using Reverse Image Search
Robust and Radial Image Comparison Using Reverse Image  Search Robust and Radial Image Comparison Using Reverse Image  Search
Robust and Radial Image Comparison Using Reverse Image Search IJMER
 
Aw32736738
Aw32736738Aw32736738
Aw32736738IJMER
 
Prediction of groundwater quality in Selected Locations in Imo State
Prediction of groundwater quality in Selected Locations in  Imo StatePrediction of groundwater quality in Selected Locations in  Imo State
Prediction of groundwater quality in Selected Locations in Imo StateIJMER
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining TechniquesIJMER
 
Low Cost Self-assistive Voice Controlled Technology for Disabled People
Low Cost Self-assistive Voice Controlled Technology for Disabled PeopleLow Cost Self-assistive Voice Controlled Technology for Disabled People
Low Cost Self-assistive Voice Controlled Technology for Disabled PeopleIJMER
 
A Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public CloudA Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public CloudIJMER
 
Cx31436438
Cx31436438Cx31436438
Cx31436438IJMER
 
Ao318992
Ao318992Ao318992
Ao318992IJMER
 
Bm31225230
Bm31225230Bm31225230
Bm31225230IJMER
 
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural Networks
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural NetworksConversion of Artificial Neural Networks (ANN) To Autonomous Neural Networks
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural NetworksIJMER
 
Nonlinear Transformation Based Detection And Directional Mean Filter to Remo...
Nonlinear Transformation Based Detection And Directional  Mean Filter to Remo...Nonlinear Transformation Based Detection And Directional  Mean Filter to Remo...
Nonlinear Transformation Based Detection And Directional Mean Filter to Remo...IJMER
 
Thermal Expansivity Behavior and Determination of Density of Al 6061-Sic-Gr ...
Thermal Expansivity Behavior and Determination of Density of  Al 6061-Sic-Gr ...Thermal Expansivity Behavior and Determination of Density of  Al 6061-Sic-Gr ...
Thermal Expansivity Behavior and Determination of Density of Al 6061-Sic-Gr ...IJMER
 
Bm32832838
Bm32832838Bm32832838
Bm32832838IJMER
 
Ijmer 46053040
Ijmer 46053040Ijmer 46053040
Ijmer 46053040IJMER
 
Big Bang–Big Crunch Optimization Algorithm for the Maximum Power Point Track...
Big Bang–Big Crunch Optimization Algorithm for the Maximum  Power Point Track...Big Bang–Big Crunch Optimization Algorithm for the Maximum  Power Point Track...
Big Bang–Big Crunch Optimization Algorithm for the Maximum Power Point Track...IJMER
 

Destaque (20)

Isolation and Identification High-Biological Activity Bacteria in Yogurt Qua...
Isolation and Identification High-Biological Activity Bacteria in  Yogurt Qua...Isolation and Identification High-Biological Activity Bacteria in  Yogurt Qua...
Isolation and Identification High-Biological Activity Bacteria in Yogurt Qua...
 
Ah32642646
Ah32642646Ah32642646
Ah32642646
 
Ad32622625
Ad32622625Ad32622625
Ad32622625
 
User Interactive Color Transformation between Images
User Interactive Color Transformation between ImagesUser Interactive Color Transformation between Images
User Interactive Color Transformation between Images
 
Experimental Investigation and Parametric Studies of Surface Roughness Analy...
Experimental Investigation and Parametric Studies of Surface  Roughness Analy...Experimental Investigation and Parametric Studies of Surface  Roughness Analy...
Experimental Investigation and Parametric Studies of Surface Roughness Analy...
 
Robust and Radial Image Comparison Using Reverse Image Search
Robust and Radial Image Comparison Using Reverse Image  Search Robust and Radial Image Comparison Using Reverse Image  Search
Robust and Radial Image Comparison Using Reverse Image Search
 
Aw32736738
Aw32736738Aw32736738
Aw32736738
 
Prediction of groundwater quality in Selected Locations in Imo State
Prediction of groundwater quality in Selected Locations in  Imo StatePrediction of groundwater quality in Selected Locations in  Imo State
Prediction of groundwater quality in Selected Locations in Imo State
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining Techniques
 
Low Cost Self-assistive Voice Controlled Technology for Disabled People
Low Cost Self-assistive Voice Controlled Technology for Disabled PeopleLow Cost Self-assistive Voice Controlled Technology for Disabled People
Low Cost Self-assistive Voice Controlled Technology for Disabled People
 
A Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public CloudA Novel Switch Mechanism for Load Balancing in Public Cloud
A Novel Switch Mechanism for Load Balancing in Public Cloud
 
Cx31436438
Cx31436438Cx31436438
Cx31436438
 
Ao318992
Ao318992Ao318992
Ao318992
 
Bm31225230
Bm31225230Bm31225230
Bm31225230
 
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural Networks
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural NetworksConversion of Artificial Neural Networks (ANN) To Autonomous Neural Networks
Conversion of Artificial Neural Networks (ANN) To Autonomous Neural Networks
 
Nonlinear Transformation Based Detection And Directional Mean Filter to Remo...
Nonlinear Transformation Based Detection And Directional  Mean Filter to Remo...Nonlinear Transformation Based Detection And Directional  Mean Filter to Remo...
Nonlinear Transformation Based Detection And Directional Mean Filter to Remo...
 
Thermal Expansivity Behavior and Determination of Density of Al 6061-Sic-Gr ...
Thermal Expansivity Behavior and Determination of Density of  Al 6061-Sic-Gr ...Thermal Expansivity Behavior and Determination of Density of  Al 6061-Sic-Gr ...
Thermal Expansivity Behavior and Determination of Density of Al 6061-Sic-Gr ...
 
Bm32832838
Bm32832838Bm32832838
Bm32832838
 
Ijmer 46053040
Ijmer 46053040Ijmer 46053040
Ijmer 46053040
 
Big Bang–Big Crunch Optimization Algorithm for the Maximum Power Point Track...
Big Bang–Big Crunch Optimization Algorithm for the Maximum  Power Point Track...Big Bang–Big Crunch Optimization Algorithm for the Maximum  Power Point Track...
Big Bang–Big Crunch Optimization Algorithm for the Maximum Power Point Track...
 

Semelhante a Au2640944101

Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
 
DSS & Risk management.pdf
DSS & Risk management.pdfDSS & Risk management.pdf
DSS & Risk management.pdfAgusMailana1
 
Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...eSAT Publishing House
 
Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...eSAT Journals
 
Intelligent decision support systems a framework
Intelligent decision support systems  a frameworkIntelligent decision support systems  a framework
Intelligent decision support systems a frameworkAlexander Decker
 
Decisionsupportsystem
DecisionsupportsystemDecisionsupportsystem
DecisionsupportsystemSalman Memon
 
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...Decision Support Systems: Concept, Constructing a DSS, Executive Information ...
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...Ashish Hande
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
A simulated decision trees algorithm (sdt)
A simulated decision trees algorithm (sdt)A simulated decision trees algorithm (sdt)
A simulated decision trees algorithm (sdt)Mona Nasr
 
Decision Support System in MIS.pptx
Decision Support System in MIS.pptxDecision Support System in MIS.pptx
Decision Support System in MIS.pptxrajalakshmi5921
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 

Semelhante a Au2640944101 (20)

Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
 
DSS & Risk management.pdf
DSS & Risk management.pdfDSS & Risk management.pdf
DSS & Risk management.pdf
 
Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...
 
Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...Harmonized scheme for data mining technique to progress decision support syst...
Harmonized scheme for data mining technique to progress decision support syst...
 
Intelligent decision support systems a framework
Intelligent decision support systems  a frameworkIntelligent decision support systems  a framework
Intelligent decision support systems a framework
 
Decisionsupportsystem
DecisionsupportsystemDecisionsupportsystem
Decisionsupportsystem
 
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...Decision Support Systems: Concept, Constructing a DSS, Executive Information ...
Decision Support Systems: Concept, Constructing a DSS, Executive Information ...
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
Lecture 6 DSS
Lecture 6  DSSLecture 6  DSS
Lecture 6 DSS
 
Dss
DssDss
Dss
 
Dss
DssDss
Dss
 
Decision support system
Decision support systemDecision support system
Decision support system
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Lecture 6 DSS
Lecture 6  DSSLecture 6  DSS
Lecture 6 DSS
 
A simulated decision trees algorithm (sdt)
A simulated decision trees algorithm (sdt)A simulated decision trees algorithm (sdt)
A simulated decision trees algorithm (sdt)
 
IT class II - ICAB PS-KL
IT class II - ICAB PS-KL IT class II - ICAB PS-KL
IT class II - ICAB PS-KL
 
Decision Support System in MIS.pptx
Decision Support System in MIS.pptxDecision Support System in MIS.pptx
Decision Support System in MIS.pptx
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 

Mais de IJMER

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless BicycleIJMER
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And AuditIJMER
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLIJMER
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyIJMER
 

Mais de IJMER (20)

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed Delinting
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja Fibre
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless Bicycle
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise Applications
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation System
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological Spaces
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And Audit
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDL
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One Prey
 

Au2640944101

  • 1. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101 ISSN: 2249-6645 Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain Situations Ahmed Bahgat El Seddawy 1, Prof. Dr. Turky Sultan2, Dr. Ayman Khedr 3 1 Department of Business Information System, Arab Academy for Science and Technology, Egypt 2, 3 Department of Information System, Helwan University, Egypt Abstract: Decision Support System (DSS) is equivalent of usage investment - direct or indirect - or credit and any synonym as management information systems (MIS). Most sector of investment will be highly or moderate or low risk. of imported data are being used in solutions like data And select which one of this sectors risk „rejected‟ or un- mining (DM). Decision supporting systems include also risk „accepted‟ all that under uncertain environments such decisions made upon individual data from external sources, as; political, economical, marketing, operational, internal management feeling, and various other data sources not policies and natural crises, all that using the contribution of included in business intelligence. Successfully supporting this study enhancing k-mean algorithm to improve the managerial decision-making is critically dependent upon results and comparing results between original algorithm the availability of integrated, high quality information and enhanced algorithm. organized and presented in a timely and easily understood The paper is divided into six sections; section two manner. Data mining have emerged to meet this need. They is a background and related work it is divided into two serve as an integrated repository for internal and external parts, part one is about DSS, part two is about DM including data-intelligence critical to understanding and evaluating K-m algorithm and enhancing in K-m algorithm. Section the business within its environmental context. With the three presents the proposed Investing Data Mining System addition of models, analytic tools, and user interfaces, they „IDMS. Section four presents IDMS experiments, have the potential to provide actionable information that implementations and results using original and enhanced k- supports effective problem and opportunity identification, m algorithm. Section five presents conclusion and finally critical decision-making, and strategy formulation, section six presents future work. implementation, and evaluation. The proposed system Investment Data Mining System (IDMS) will support top II. Background and related work level management to make a good decision in any time 2.1 Decision Support System (DSS) under any uncertain environment and on another hand DSS includes a body of knowledge that describes using enhancing K-mean algorithm. some aspects of the decision maker's world that specifies how to accomplish various tasks, that indicates what Keywords: dss, dm, mis, clustering, classification, conclusions are valid in different circumstances [4].The association rule, k-mean, olap, matlab expected benefits of DSS that discovered are higher decision quality, improved communication, cost reduction, I. Introduction increased productivity, time savings, improved customer Decision Support System (DSS) is equivalent satisfaction and improved employee satisfaction. DSS is a synonym as management information systems (MIS). Most computer-based system consisting of three main interacting of imported data are being used in solutions like data components: mining (DM). Decision supporting systems include also decisions made upon individual data from external sources,  A language system: a mechanism to provide management feeling, and various other data sources not communication between the user and other components included in business intelligence. Successfully supporting of the DSS. managerial decision-making is critically dependent upon the  A knowledge system: A repository of problem domain availability of integrated, high quality information knowledge embodied in DSS as either data or organized and presented in a timely and easily understood procedures. manner. Data mining have emerged to meet this need. They  A problem processing system: a link between the serve as an integrated repository for internal and external other two components, containing one or more of the data-intelligence critical to understanding and evaluating the general problem manipulation capabilities required for business within its environmental context. With the addition decision-making. of models, analytic tools, and user interfaces, they have the potential to provide actionable information that supports effective problem and opportunity identification, critical Languag Knowl Problem decision-making, and strategy formulation, implementation, Processi e System edge and evaluation. The proposed system will support top level System ng management to make a good decision in any time under any System uncertain environment [4]. This study aim to investigate the adoption process of decision making under uncertain situations or highly risk environments effecting in decision of investing stoke cash of bank. This applied for two types www.ijmer.com 4094 | Page
  • 2. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 Fig 1: DSS Main Components After surveying multiple decision support systems, it is concluded that decision support systems are categorized into the following [5]:  File drawer systems: This category of DSS provides access to data items.  Data analysis systems: Those support the manipulation of data by computerized tools tailored to a specific task or by more general tools and operators.  Analytical information systems: Those provide access Fig 2: DM Techniques [5] to a series of decision-oriented databases.  Accounting and financial models: those calculate the 2.2.1 Clustering Technique consequences of possible actions. Clustering can be considered the most important  Representational models: those estimate the learning problem; like every other problem of this kind, it consequences of actions based on simulation models deals with finding a structure in a collection of unlabeled that include relationships that are causal as well as data. A loose definition of clustering could be “the process accounting definitions. of organizing objects into groups whose members are  Optimization models: those provide guidelines for similar in some way”. actions by generating an optimal solution consistent A cluster is therefore a collection of objects which with a series of constraints. are “similar” between them while are “dissimilar” to the  Suggestion models: those perform the logical objects belonging to other clusters as shown in figure 3. [6] processing leading to a specific suggested decision or a Define clustering by saying “Clustering involves identifying fairly structured or well understood task. a finite set of categories or segments „clusters‟ to describe This section describes the approaches and the data according to a certain metric. The clusters can be techniques mostly used when developing data warehousing mutually exclusive, hierarchical or overlapping”. [6] systems that data warehousing approaches such as; Online Defines clustering as follows: “clustering enables to find Analytical Processing „OLAP‟, Data Mining „DM‟ and specific discriminative factors or attributes for the studied Artificial Intelligence „AI‟. And in this paper will using DM data. Each member of a cluster should be very similar to as approach and technique. other members in its cluster and very dissimilar to other clusters. When a new data is introduced, it is classified into 2.2 Data Mining Techniques (DM) the most similar clusters. Techniques for creating clusters Data mining is the process of analyzing data from include partitioning methods as in k-means algorithm, and different perspectives and summarizing it into useful hierarchical methods as decision trees, and density-based information [10]. DM techniques are the result of a long methods”. process of research and product development [10]. There are several processes for applying DM: 1. Definition of the business objective and expected operational environment. 2. Data selection is required to identify meaningful sample of data. 3. Data transformation that involves data representation in an appropriate format for mining algorithm. 4. Selection and implementation of data mining algorithm depends on the mining objective. 5. Analysis of the discovered outcomes is needed to Fig 3: Simple graphical for clustering data [6] formulate business outcomes. 6. Representing valuable business outcomes. The goal of clustering is [6] to determine the basic grouping in a set of unlabeled data, to find representatives DM techniques usually fall into two categories, for homogeneous groups which are called “data reduction”, predictive or descriptive. Predictive DM uses historical data to find „natural clusters‟ and describe their unknown to infer something about future events. Predictive mining properties these clusters are called „natural‟ data types, to tasks use data to build a model to make predictions on find useful and suitable groupings this can be called „useful‟ unseen future events. Descriptive DM aims to find patterns data classes‟ and to find unusual data objects outlier in the data that provide some information about internal detection. There are several advantages for the clustering hidden relationships. Descriptive mining tasks characterize technique use it. Among these advantages are recognizing the general properties of the data and represent it in a the number of clusters, grouping similar members together, meaningful way. Figure2 shows the classification of DM identifying the discriminate attributes, ranking the techniques. discriminate attributes, and recognizing the discriminate www.ijmer.com 4095 | Page
  • 3. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 attributes of one cluster and representing them within the Before looking to the enhancements that are business context. A major disadvantage of clustering is that applied on K-mean algorithm, one should give the it is suitable for static data but not for dynamic data where impression of pros, cons and problems of k-mean algorithm, its value changes over time or due to any other factor. then presents the enhanced k-mean algorithm and presents a Another disadvantage is that sometimes the generated way for evaluating this enhancement. There are several clusters may not have a practical meaning. Finally, it is strengths and weakness shown in table 5.1 (Arthur, 2006). possible not to spot the cluster sometimes as there is no exact idea of what to look for or there is a lack of collected Table 1: Strengths and weakness of K-men algorithm data [7]. (Arthur, 2006) Strengths Weakness 2.2.1.1 K-Mean Algorithm 1. Vectors can 1. Quality of the [8] Defines K-means as one of the simplest flexibly change output depends on unsupervised learning algorithms that solve the well known clusters during the the initial point. clustering problem. Build to classify or grouping objects process. 2. Global optimum based on features into „k‟ number of group. K is positive 2. Always converges solution not integer number and the grouping is done by mining the sum to a local optimum guaranteed. of squares of distance between data and the corresponding 3. Quite fast for most cluster centriod. The cluster centroid is the average point in application the multidimensional space defined by the dimensions [9]. There are a lot of applications of the K-mean clustering, There are some problems in k-mean algorithm such as range from unsupervised learning of neural network, Pattern (Arthur, 2006): recognitions, Classification analysis, Artificial intelligent, image processing, machine vision, etc. In principle, we have 1. Non globular clusters (overlapping in data between several objects and each object have several attributes and clusters) we want to classify the objects based on the attributes, then 2. Assume wrong number of clusters. we can apply this algorithm. There are commonly four steps 3. Find empty clusters. followed for K-mean idea [10]; 4. Bad initialization to centriod point 1. Place K points into the space represented by the 5. Choosing the number of clusters objects that are being clustered. These points represent initial group centers. The most common measure to evaluate k-men algorithm is 2. Assign each object to the group that has the closest Sum of Squared Error „SSE‟ with another words called, centered. Sum of Squared Distances „SSD‟ (Wenyan Li, 2009). 3. When all objects have been assigned, recalculate the positions of the K centered.  For each point, the error is the distance to nearest 4. Repeat Steps 2 and 3 until the centered no longer cluster. change. This produces a separation of the objects into  To get SSE or SSD, we square these error and sum groups from which the metric to be minimized can be them calculated as follows.  k nj  2 xij  c j x is a data point in cluster Ci . j 1 i 1 mi is the representative point for cluster Ci. K: number of clusters nj: number of points in jth cluster  There is one way to reduce SSE, is to increase K xij: ith point in jth cluster „Number of clusters‟.  The good clustering with smaller K can have slower SSE than a poor clustering with higher K. After investigating the most common algorithms for data clustering, there is enhancement for the most familiar algorithm which called k-means trying to solving some problems of k-mean algorithm. More features will be added to enhanced algorithm such as; change centriod point from random points to center points for data and adding step to avoid empty clusters when visualize data it shows also in steps of algorithm. Figure 5.31 shows the flowchart of enhanced K-men algorithm. 1. Assume number of cluster “K”. Fig 4: Procedures code for k-means algorithm [10] 2. Calculate „K‟ at center points of data set 2.2.1.2 Enhanced K-means ‘E_KM’ www.ijmer.com 4096 | Page
  • 4. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 𝐻 → 𝑌 →Rows 𝑊 → 𝑋 → 𝐶𝑜𝑙𝑢𝑚𝑛𝑠 III. IDMS Implementation Mid P = (X,Y) Investment Data Mining System „IDMS‟ aims to 𝑊 𝐻 build a data mining system for investment in the banking P = ( , ) % for every cluster 2 2 sector. IDMS consists of several components; data Ex, H= 400, W= 30 gathering, preparing data to discover knowledge, data 30 400 preprocessing, using data mining techniques in sequences Mid P = ( 2 , 2 ) P = (15, 200) ← Center steps start with classification data, clustering data especially point for Cluster using K-mean algorithm and enhanced K-mean algorithm to Fig 6: Pseudo code of calculating „k‟ center point set which best result and then set and run association rules to solve problem, post processing and finally get result and 3. Calculate the distance between a data sample and visualize result to create best decision to take a good clusters. decision for investment under uncertain situations [18]. 4. Assign a data sample to closest cluster center. 5. Calculate new cluster center. 3.1 Experiments 6. Repeat step 3.4 and 5 until no objects move group. 7. Avoid empty clusters 3.1.1 Data Collected The following data is taken from a bank is located in Egypt and has several branches. The bank serves more Loop through each cluster than 100000 customers per year, contracting with more than If all items inside cluster equal 0 , 50000 organizations. There are some basics found through delete cluster interviews such as; explain overall process in bank for usage Else and resources of cash in bank. IDMS will focus in depth of do nothing; investment department. All customers and departments Fig 7: Pseudo code of avoiding empty clusters data‟s is stored electronically using SQL database. The database stores values in six fields: Customer name, 8. Perform visualization Customer number or ID, Previous commitments, Paperwork, Type of investment, Sector of the field investment and Previous debt, shown in table 2. The received data is in the Start form of SQL database converted to excel sheets. Assume Number of cluster “K” Start with centers points of data set Calculate the distance between data sample & cluster Assign data sample to closest cluster center Calculate new cluster center No No objects move group Yes Avoid empty clusters Perform Visualization End Fig 8: E_K-mean algorithm flowcharts www.ijmer.com 4097 | Page
  • 5. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101 ISSN: 2249-6645 Table 2: Collected Data Request Request Customer Customer Type of Sector of Previous Serial Paperwork Date Name ID investment investment debt Number 1-11-2010 1 Ahmed 20123 Ok Industry Durable goods No 5-12-2010 2 Co. 20123 Ok Securities Durable goods No Cultivation 1-11-2010 3 Samed 20520 Ok Securities No strategies Import and 1-11-2010 4 Saaed 20002 Ok Industry No export Cultivation 1-11-2010 5 Co. 20135 Ok Agriculture No strategies 1-11-2010 6 Mohamed 20123 Ok Industry Foodstuffs No … …. … …. …. …. …. …. Table 3: Format for each field for data 2010 20520 03 032 Sector of the investment investment Paperwork Customer Customer Previous Number Request Request Type of Name Serial 2010 73002 01 011 Date field debt ID 2011 20135 03 031 2012 20033 01 011 …. …. …. …. Numb Numb Date Text Text Text Text Text er er IDMS execution done via several techniques started with clustering technique using K-M and enhanced 3.2 Preprocessing k-mean algorithm, classification technique using ID3 The data collected undergoes four preprocessing algorithm and association rules technique using apriori steps and the data matrix is reduced from 8 columns to 7 algorithm. Next section will discuss the results of execution columns. The first step converts data from textual values to for fist technique clustering. numeric ones in order to deal with identification numbers as in table IV. Clustering Results 4.1 Clustering sectors Table 4: Example of data after converting to numeric sheet First: Original K-mean Algorithm results Table 6 presents the set of data used in the investment investment Paperwork Customer Sector of Previous Number Request Request Type of implementation experiments for training data to set a best Serial Date debt ID result and choosing an effective number of clusters and percentage of data set to apply the technique for the system. These data are divided among 10 clusters representing the 201 2012 different percentage of the set of data used using K-Mean 1 1 03 032 1 0 3 algorithm. 201 2012 2 1 03 032 1 0 3 Table 6: Testing data on a 10 clusters model using K-M … …. …. …. …. …. …. Algorithm C C C C C C In the second step, the interesting selected attributes are C0 C3 C5 C8 1 2 4 7 Request data, Customer ID, Type of Investment and Sector 6 9 of investment as shown in table 5. Elements with values Petrochemic Technologie Agriculture Securities Tourism Industry Trading indicate that the type and sector of investment. Non Non Non als s Table 5: Example of data after selecting main segments Request Customer Type of Sector of Date ID investment investment 2010 20123 03 032 www.ijmer.com 4098 | Page
  • 6. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 30.00% 7.09 % 8.16% 23 % 19 % 17 % 14% 25.00% 3% 0% 0% 9% 20.00% 15.00% 10.00% 5.00% 0.00% 25 Industry Trading Securities Tourism Petrochemicals Agriculture Technologies 20 15 10 C0 C1 C2 C3 C4 C5 C6 5 Fig 11: Distribution percentages of sectors in testing set of data for a 10 cluster model 0 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 These graph present clustering of sectors are used in investment sector to find a good usage for cash in bank. The results which are appearing from this algorithm Fig 10 : Distribution percentages of sectors in testing set of describe the distribution percentage of sectors in testing set data for a 10 cluster model of data for 7 clusters by E_K-M, this enhance consider effect on data results on IDMS. These graph present clustering of sectors are used in investment sector to find a good usage for cash in bank. 4.2 Clustering risks based on sectors using E_K-M The results which are appearing from this algorithm Table 8 presents the set of data used in the describe the distribution percentage of sectors in testing set implementation experiments for training data to set a best of data for 10 clusters by K-M, this enhance consider effect result and choosing an effective number of clusters and on data results on IDMS. percentage of data set to apply the technique for the system. These data are divided among 3 clusters representing the Second: Enhanced K-mean Algorithm results different percentage of the set of data used and how After applying enhanced k-mean algorithm to distribute risk of sectors after clustering them to seven avoid empty clusters and re clustering data more accurate clusters using in IDMS. using change center point for „K‟, it gives the research more accurate results appear in next section. Table 7 Table 8: Testing data on a 3 cluster model using E_K-M presents the set of data used in the implementation Algorithm experiments for training data to set a best result and choosing an effective number of clusters and percentage of data set to apply the technique for the system. These data C0 C1 C2 are divided among 7 clusters representing the different Low Risk Mid Risk High Risk percentage of the set of data used using enhanced K-Mean algorithm E_K-M. Agriculture (1) Trading (4) Tourism (5) Petrochemicals Technologies Securities (3) Table 7: Testing data on a 7 clusters model using E_K-M (6) (7) Algorithm Industry (2) 48.57% 40.08% 11.35% C0 C1 C2 C3 C4 C5 C6 Petrochemica Technologies 7 Agriculture Securities Tourism Industry Trading ls 5 Sectors Number 3 1 28.57% 10.20% 4.08% 8.16% 2.04% 8.16% 6.12% -1 Low Risk C0 High Risk C1 Mid Risk C2 Fig 12: Distribution sectors numbers based on risk level www.ijmer.com 4099 | Page
  • 7. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 Figure 12 represent distribution of sectors number based on amount of overlapping elevates and the level of accuracy risk level. Every level of risk has colure such as, red color declines. Second the efficiency is shown in E_K-M and for high risk to invest in this sector, green color for medium original K-M in two aspects. Time efficiency in original K- level of risk to invest in this sector and blue color for low M is less than time efficiency in E_K-M because of adding risk to invest in this sector. two steps of enhancement, so time will increase and it may be accepted when raise the quality of results. Size 4.3 Result of Comparison between K-M and E_K-M efficiency in original K-M is less than space of saving data There are some similarities and differences efficiency in E_K-M because there are steps added to K-M between algorithms and their enhancement which describe to avoid empty clusters. On the other hand, using the the utility, efficiency, applicability, accuracy, popularity, enhanced K-mean algorithm decreases the size of the data flexibility and visualization of the two algorithms. A set and increases the number of clusters. Accordingly, the comparison between the two algorithms is shown in the amount of overlaps decreases and accuracy increases. following table 9. Generally, the enhanced K-mean algorithm showed a better Based on comparison table finding that, first performance compared to traditional K-mean algorithm. traditional K-mean algorithm as the size of the data set decreases and the number of clustering increases, the Table 9: Comparison between KM and E_K-M algorithms Approach K-Mean Algorithm E_K-mean Algorithm 1. Theory K-mean start with assume number of clusters, E_K-mean start with assume number picking „K‟ randomly, calculate distance and of clusters, picking „K‟ at center visualize. point, calculate distance and avoid empty clusters then visualize. 2. Efficiency Time : Less efficient Time : More efficient (Time and Space : Less Efficient Space : More Efficient Space) 3. Applicability Applicable Applicable 4. Accuracy Less accurate More accurate 5. Popularity Popular Under Testing 6. Flexibility Same level Same level 7. Visualization data Use visualization Use visualization 8. Generalization Widely used Limited use 9. Number of cluster Use small and huge numbers Use small and huge numbers 10. Type of data Use all type of data source Use all type of data source source 11. Size of data Huge and small data Huge and small data source In next step of this study implementing this V. Conclusions proposed approach using classification and association This paper represents applying clustering technique techniques to give full image and best result for high level of by enhancing K-M algorithm for DSS in banking sector management with a good decision and high accurate results. especially in investment department under uncertain situations which has been rarely addressed before. IDMS is VI. ACKNOWLEDGMENT a new proposed system which is simple, straightforward with I want to express my deepest gratitude for my low computation needs. The proposed preprocessing professor‟s supervisors Prof. Dr. Turky Sultan and Dr. component is an aggregation of several known steps. The Ayman Khedr for their major help and support through all post processing component is an optional one that eases the the phases of research and development. interpretation of the investment results. The banking is planning a set of actions in accordance of IDMS outcomes REFERENCE for decision making in investment sector. The investment [1] A. Hunter and S. Parsons, "A review of uncertainty department in the banking is starting to analyze the handling formalisms", Applications of Uncertainty approached investment sector, to introduce a good decision Formalisms, LNAI 1455, pp.8-37. Springer -Verlag, under uncertain situation after enhance K-M algorithm to 1998. give high accurate and high quality data, that shown in [2] E. Hernandez and J. Recasens, "A general framework comparison table. for induction of decision trees under uncertainty", Modelling with Words, LNAI 2873, pp.26–43, Future work Springer-Verlag, 2003. www.ijmer.com 4100 | Page
  • 8. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-2594-2600 ISSN: 2249-6645 [3] M. S. Chen, J. Han, and P. S. Yu. IEEE Trans [13] Thearling K, Exchange Applications White Paper, Inc. Knowledge and Data Engineering Data mining. An increasing customer value by integrating data mining overview from a database perspective, 8:866-883, 1996. and campaign management software, 1998. [4] U. Fayyad, G. Piatetsky-Shapiro and W. J. Frawley. [14] Noah Gans, Spring. Service Operations Management, AAAI/MIT, Press definition of KDD at KDD96. Vol. 5, No. 2, 2003. Knowledge Discovery in Databases, 1991. [15] Joun Mack. IEEE TRANSACTIONS ON PATTERN [5] Gartner. Evolution of data mining, Gartner Group ANALYSIS AND MACHINE INTELLIGENCE. An Advanced Technologies and Applications Research Efficient k-Means Clustering Algorithm, Analysis and Note, 2/1/95. Implementation, VOL. 24, NO. 7, JULY 2002. [6] International Conferences on Knowledge Discovery in [16] Andrew Moore and Brian T. Luke. Tutorial Slides, K- Databases and Data Mining (KDD‟95-98), 1995-1998. means and Hierarchical Clustering and K-Means [7] R.J. Miller and Y. Yang. Association rules over Clustering, Slide 15, 2003. interval data. SIGMOD'97, 452-461, Tucson, Arizona, [17] E. Turban, J. E. Aronson, T. Liang, and R. Sharda, 1997. Decision Support and Business Intelligence Systems, [8] Zaki, M.J., SPADE An Efficient Algorithm for Mining eighth edition. Prentice Hall, 2007. Frequent Sequences Machine Learning, 42(1) 31-60, [18] Ahmed El Seddawy, Ayman Khedr, Turky Sultan, 2001. “Adapted Framework for Data Mining Technique to [9] Osmar R. Zaïane. “Principles of Knowledge Discovery Improve Decision Support System in an Uncertain in Databases - Chapter 8 Data Clustering”. & Shantanu Situation”, International Journal of Data Mining & Godbole data mining Data mining Workshop 9th Knowledge Management Process (IJDKP) Vol.2, No.2, November 2003. Jun 2012. [10] T.Imielinski and H. Mannila. Communications of ACM. A database perspective on knowledge discovery, Ahmed B. El Seddawy, M.S. 39:58-64, 1996. degrees in Information System [11] BIRCH Zhang, T., Ramakrishnan, R., and Livny, M. from Arab Academy for Science SIGMOD '96. BIRCH an efficient data clustering and Technology in 2009. He method for very large databases. 1996. now with AAST Egypt Teacher [12] Pascal Poncelet, Florent Masseglia and Maguelonne Assisting. Teisseire (Editors). Information Science Reference. Data Mining Patterns New Methods and Applications, ISBN 978 1599041629, October 2007. www.ijmer.com 4101 | Page