SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
Mining Irregular Association Rules based on Action & Non-action Type
                                  Data

                        Razan Paul, Abu Sayed Md. Latiful Hoque
                    Department of Computer Science and Engineering,
       Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
                  razanpaul@yahoo.com, asmlatifulhoque@cse.buet.ac.bd


                      Abstract                                  rules from desire itemsets compared to any level-
                                                                wise search algorithm. The algorithm treats all the
    Conventional positive association rules are the             items as being either actions that include decision,
patterns that occur frequently together. These                  action and output or non-actions that include facts,
patterns represent what decisions are routinely made            statements and any criterion. In our problem, non-
based on a set of facts. Irregular association rules            action items appear very frequently in the data, while
are the patterns that represent what decisions are              action items rarely appear with the high frequent
rarely made based on the same set of facts. Many                non-action items. Negative association rules find
domains like Healthcare, Banking etc need the                   patterns where items are conflict with each other and
irregular rule to improve their system. In this paper,          do not find decision, which are made rarely. Rare
we propose a level wise search algorithm that works             association rules are the patterns containing rare
based on action and non-action type data to find                items which are less frequent items and do not find
irregular association rules. We have observed that              decisions which are made rarely. Irregular pattern
irregular association rules can be discovered                   represents wrong decision, illegal practice and
efficiently based on action type and non-action type            variability in decision.
data from large database. To the best of our
knowledge, there is no algorithm that can determine             2. Related Work
such type of associations. Its effectiveness has been
demonstrated by testing it for a real world patient                A rare association rule forms with rare data items
data set.                                                       or between frequent and rare data items. However,
                                                                this rule does not map items that are used to make
1. Introduction                                                 decision/action to antecedent and items that represent
                                                                actions or decision to consequent. Moreover it does
   Data mining is applied to discover new                       not expect antecedent to be high frequent because it
information, which is hidden in the existing                    find rule with high confidence. The algorithms to
information. One of the techniques in data mining is            find these rules only assign different support based
finding association rule. The first pioneering work to          on items frequency. A number of approaches has
mine conventional positive association rules was                been proposed to mine rare associations [7-11]. In
explained in [1]. In this work, they showed finding             [7], minimum support is computed for each item
association rule problem can be separated into two              based on the frequency of the item. In [8], A fixed
sub problems. After that many algorithms [1-6] have             minimum support is applied to extract desired
been proposed to mine positive association rules                itemsets, which consist of only frequent items and
efficiently. These algorithms find rules that represent         relative support is applied to extract desired itemsets,
decisions that occur frequently based on a set of               which consist of rare items. In [9], Negative-
facts. In other words, rules discovered by current              Binomial distribution is applied to extract Negative-
association mining algorithms [1-6] are patterns that           Binomial desired itemsets. In [10], the association
represent what decisions are usually made. In this              rules are found by taking into consideration only
problem, we need patterns that are rarely made. We              infrequent items. The approach suggested in [11]
have proposed a novel mining algorithm that can                 finds the association rules by computing item-wise
efficiently discover the association rules from the             minimum support.
existing data that are strong candidates of variability.           A negative association rule presents a relationship
The algorithm uses a different candidate itemset                among itemsets and states the presence of some
selection process, a modified candidate generation              itemsets in the absence of others. Every positive
process, and a different mechanism of generating                association rule P        Q has three corresponding




978-1-4244-7571-1/10/$26.00 ©2010 IEEE                     63
negative association rules, P      ¬Q, ¬P    Q and ¬P                3. Irregular Association Rules
    ¬Q. To extract negative association rules, most
papers employ different correlation measures                            Let D = t1 , t 2 , . . . , t n be a database of n
between attributes [12-14]. In [13], the author                      transactions with a set of items I = i1 , i2 , . . . , im .
proposed a level-wise search algorithm for mining                    Let set of action items of I be AI = ai1 , ai2 , . . . , aik
both positive and negative association rules that                    where k is the number of action items. Let set of
employs rule dependency measures. In [14], authors                   non-action          items           of       I             be
proposed another level-wise search algorithm for                     NAI = nai1 , nai2 , . . . , naim k where              is the
simultaneously extracting positive and negative                      number of non- action items. For an itemset P                I
association rules using Pearson correlation                          and a transaction t in D, we say that t supports P if t
coefficient. In [15] , author have proposed detection                has values for all the attributes in P; for conciseness,
model using multi layer perceptron neural networks                   we also write P             t. By Dp we denote the
(MLP) to detect fraud/abuse problem based on
                                                                     transactions that contain all attributes in P. The
medical claims. It has been proposed to detect new,                                                                      D
unusual and known fraudulent/abusive behaviors. It                   support of P is computed as P = p , i.e. the
works based on detection model which is very slow                    fraction of transactions containing P. A irregular rule
and need huge memory requirement to analyze                          is of the form: P Q, with P         NAI, Q    A I, P
existing large database. In [16], author used positive               Q = . To hold the rule following condition must
association rule to build clinical pathways, which can               meet: P P or support P >=
detect fraud and abuse on new data. However, this                                            , P P , Q or support P, Q <
model cannot detect fraud and abuse from the                         =
existing large healthcare data. Our proposed                              P P ,Q
                                                                     and          <=               confidence where P(x)
approach detects fraud and abuse from the existing                         P P
large information.                                                   is the probability of x.

                                                                          Original     Mapped           Original    Mapped
                                   Generate dictionary for                value        value            value       value
                                   each categorical attribute             Headache      1               Yes          1
                                                                          Fever        2                No          2
               PatientActual Data
                        Age    Smoke     Diagnosis                         Dictionary of                       Dictionary of
                ID                                                         Diagnosis attribute                 Smoke attribute
               1020D    33     Yes       Headache
               1021D    63     No        Fever                                                   Map to integer items using
                                                                                                 rule base and dictionaries
                     Actual data

                                          If   age <= 12 then 1
                  Medical                 If   13<=age<=60 then 2
                  domain                  If   60 <=age then 3                         Patient      Age    Smoke     Diagnosis
                  knowledge               If   smoke = y then 1                         ID
                                          If   smoke = n then 2                        1020D        2      1         1
                                          If   Sex = M then 1
                                                                                       1021D        3      2         2
                                          If   Sex = F then 2

                                                   Rule Base                            Data suitable for Knowledge Discovery

                              Figure 1. Data transformation of medical data
4. Mapping complex medical data to                      map continuous numerical data to items using these
mineable items                                          developed rules.
                                                              We have used domain dictionary approach to
   For knowledge discovery, the medical data have       transform the data, for which medical domain expert
to be transformed into a suitable transaction format    knowledge is not applicable, to numerical form. As
to discover knowledge. We have addressed the            cardinality of attributes except continuous numeric
problem of mapping complex medical data to items        data are not high in medical domain, these attribute
using domain dictionary and rule base as shown in       values are mapped integer values using medical
figure 1. The medical data are types of categorical,    domain dictionaries. Therefore, the mapping process
continuous numerical data, Boolean, interval,           is divided in two phases. Phase 1: a rule base is
percentage, fraction and ratio. Medical domain          constructed based on the knowledge of medical
expert have the knowledge of how to map ranges of       domain experts and dictionaries are constructed for
numerical data for each attribute to a series of items. attributes where domain expert knowledge is not
For example, there are certain conventions to           applicable, Phase 2: attribute values are mapped to
consider a person is young, adult, or elder with        integer values using the corresponding rule base and
respect to age. A set of rules is created for each      the dictionaries.
continuous numerical attribute using the knowledge
of medical domain experts. A rule engine is used to                  5. The proposed algorithm




                                                                64
General intuition of this algorithm is as follows:
based on a set of lab tests with same results, if 99%           5.1. Candidate Generation
doctors practice patients as disease x and 1 percent
doctors practice patients as other diseases, then there             The idea behind candidate generation of all level-
is a strong possibility that this 1 percent doctors are         wise algorithms like Apriori is based on the
doing illegal practice. In other words, if consequent           following simple fact: Every subset of a frequent
C occurs infrequently with antecedent A and                     itemset is frequent so that they can reduce the
                                                                number of itemsets that have to be checked.
that is a strong candidate of variability. In every             However, our proposed algorithm in candidate
domain, there are a set of facts. Based on these facts,         generation phase check this fact if the itemsets only
decision and action are taken. In a rule S       T, if S        contains non-action items. This idea makes itemsets
contains a set of facts and T contains decision or              consist of both rare action items and high frequency
action. Then such rules represent the decision T with           non-action items. If the new candidate contains one
their corresponding facts S. If S       T has sufficient        or more action items then it is selected as a valid
support and confidence then it represents that                  candidate. If the new candidate contains only non-
decision or action T is taken routinely based on facts          action items then, it is selected as a valid candidate
S. However, if S is high frequent and rule S-T has              only if every subset of new candidate is frequent.
very low confidence. Then it indicates based on facts           This way the algorithm keeps the new candidates
S any other decision instead of T is usually taken. It          that have one or more action items.
also indicates that the decision is exceptionally taken
based on these facts. The main features of the                  5.2. Candidate Selection
proposed algorithm are as follows:
     If minimum support is only used like                       We have used two separate supports metrics to filter
     conventional association mining algorithm,                 out candidates. An itemset with only non-action
     desired itemsets that involve rarely appeared              items is compared with minimum antecedent support
     action items with the high frequent non-action             metric as non-action items can only take part in
     items will not be found. To find rules that                antecedent part of irregular rule, which need to be
     involve both frequent antecedent part and rare             high frequent. An itemset with one or more action
     consequent items, we have used two support                 items is compared with maximum antecedent
     metrics:     minimum        antecedent     support,        consequent support metric to keep rare action items
     maximum antecedent consequent support.                     with the high frequent non-action items. An itemset
     The proposed algorithm uses maximum                        with only non-action items is selected if it has
     confidence constraint instead of widely used               support greater or equal to minimum antecedent
     minimum confidence constraint to form the                  support. An itemset with one or more action items
     rules. Moreover, it partitions itemsets into action        is selected if it has support smaller or equal to
     item and non-action items instead of subset                maximum antecedent consequent support. By this
     generation to form rules.                                  way, itemsets are explored which has high support
     Rules have non-action items in the antecedent              for non-action items and low support for action items
     and action items in the consequent.                        with high support non-action items. Here pruning is
     In candidate generation, it does not check the             based mostly on minimum antecedent support,
                                                                maximum antecedent consequent support and
                                                                checki
     or more action items to keep that itemset.
    Let MAS is minimum antecedent support, MACS
is maximum antecedent consequent support, Ij is the             5.3. Generating Association Rule
itemsets of size j, Sm is the desired itemset of size m;
Ck be the sets of candidates of size k. Figure 2 shows          This problem needs association rules that represent
the association mining algorithm for finding irregular          irregular relationships between action and non-action
rule. Like algorithm Apriori, our algorithm is also             items that occur rarely together. For this reason, the
based on level wise search. Each item consists of               proposed algorithm uses maximum confidence
attribute name and its value. Retrieving information            constraint     to form rules as it needs rule that has
of a 1-itemset, we make a new 1-itemset if this 1-              high support in antecedent portion and has very low
itemset is not created already, otherwise update its            support in itemset from which the rule is generated.
support. The non-action 1-itemset is selected if it has         It selects a rule if its confidence is less or equal to
support greater or equal to minimum antecedent                  maximum confidence constraint. Moreover, it does
support. The action 1-itemset is selected whatever              not use subset generation to the itemsets to form
support it has. By this way, 1-itemsets are explored            rules. Here an itemset is partitioned into action item
which have high support for antecedent items and                and non-action items. Action items are for
have arbitrary support for consequent items.                    consequent part and non-action items are for




                                                           65
antecedent part. Here each itemset is mapped to only
one rule.
 Algorithm: Find itemsets which consist of non-action             Procedure CalculateCandidatesSupport(Ck)
 items with high support and action items with low                 1, For each transaction t of Database
 support based on candidate generation.                                1.1 CalculateSupportFromOneTransactionFor-
 Input: Database, minimum antecedent support, maximum                      Cadidates(Ck, t);
 antecedent consequent support                                    procedure CalculateSupportFromOneTransaction
 Output : Itemsets which are strong candidates of variability.    ForCadidates(Ck, t)
      1. K=1, S = {Ø};                                               1.Ct =Find the subsets of Ck which are candidate
      2. Read the metadata about which attributes are action                                      t
            type and which are not.                                        2.1 c.count++
      3. Ik = Select 1-itemsets either which consist of a         Algorithm : Find Assosiation rules for Variability
            non-action item and has support greater or equal to   Finding
            minimum antecedent support or which consists of       Input: I (Vaiavility Itemsets), maximumConfidence
            an action item.                                       Output: R ( set of rules )
      4. While(Ik          {                                      1. R = Ø
            4.1 K++;                                              2. For each X I
            4.2 Ck = Candidate_generation(Ik-1)                         2.1 Antecedent set AS = (as1, as2               n){
            4.3 CalculateCandidatesSupport(Ck)                               where asi X and AC(asi
            4.4 Ik = SelectDesiredItemSetFromCandidates                 2.2 Consequent set CS = (cs1, cs2                 n){
                     (CK, Sk , MAS, MACS);                                    where csi X and AC(csi
            4.5 S = S U Sk                                              2.3 if (support (AS CS)/Support (AS)) <=
      5. return S                                                             maximum confidence
 procedure SelectDesiredItemSetFromCandidates                                      2.3.1 AS CS is a valid rule.
 (CK , Sk , MAS, MACS)                                                             2.3.2 R = R U (AS CS)
 1. For each Itemset c Ck                                         procedure Candidate_generation(Ik-1)
     1.1 If c contains only non-action items                      1.For each Itemset i1 Ik-1
                      1.1. 1 If c.support >= MAS                     1.1 For each Itemset i2 Ik-1
                       1.1.2 Add it to I                              1.1.1 Newcandidate, NC = Union(i1,i2);
     1.2 else if c contains one or more action items                  1.1.2 If Size of NC is k
                 with non-action items.                                 1.1.2.1 If NC contains one or more action items
                       1.2.1 If c.support <= MACS                                1.1.2.1.1 Add it to Ck if every subset of
                       1.2.2 Add it to I & Sk                                              non-action items is frequent.
     1.3 If c contains only action items                                1.1.2.2 else
                       1.3.1 Add it to I                                        1.1.2.2.1 If every subset of NC is frequent
  2. return I                                                                              1.1.2.2.1.1 Add it to Ck
                                                                                                     othewise remove it.
                                                                  2. return Ck;
                   Figure 2. Association mining algorithm for finding irregular rule
5.3.1. Lemma 1. Number of rules is equal to number      maximum confidence support. Action item and non
of desired itemsets and number of discarded rules =     action item of a desired itemset is mapped to
mp     S where S is the number of desired itemsets.     antecedent items and consequent items of a rule. So
Proof: A single desired itemset consists of action      every desired itemset is mapped to a single valid
type items and non-action type items. Action items      rule. Total rules = number of desired itemsets = S.
and non-action items are mapped to consequent and       Let m is the average number of distinct value, each
antecedent parts respectively. Let I = { i1, i2      n} multidimensional attribute holds. P is the number of
be the set of items to be mined, where items can be     attributes to be mined. Number of possible different
either action type or non-action type. Let AI = {ai1,   rules =     . Number of discarded rules =
ai2        u) be the set of action items to be mined.   where S is the number of desired itemsets.
Let NAI= { nai1, nai2             v) be the set of non-
action items to be mined. Each nai has to have          6. Results and discussion
confidence greater than minimum confidence support
to be included as 1- itemset and all ai are included as     The experiments were done using PC with core 2
1- itemset. Let, C= {c1,c2,c3         n} be the set of  duo processor with a clock rate of 1.8 GHz and 3GB
candidate itemsets. A new candidate NC is added to      of main memory. The operating system was
C if the non-action part of NC named NCNA holds         Microsoft Vista and implementation language was
the following property: support (each subset of         c#. We used a patient dataset to verify our method.
NCNA) >= minimum antecedent support. A                  The dataset contains items, which are either actions
candidate c is selected for rule generation if and only that include decision, diagnosis and cost or non-
                                                        actions that include lab tests, any symptom of patient




                                                            66
and any criterion of disease. Each instance represents                                which has 50273 instances and 514 attributes
the data of one patient. We have filtered out                                         (included 150 discrete and 364 numerical attributes).
instances which has noisy or missing values. The                                      All these data are converted into mineable items
data set of interest has collected and preprocessed                                   (integer representation) using domain dictionary and
from the different local hospitals of Bangladesh,                                     rule base.
                               2000                                                                                             6
    times in secs




                                                                                                             Number of rules
                               1500                             A priori                                                        4                         MAS = .7
                               1000                                                                                                                       MACS = .1
                                500                                                                                             2
                                                                Proposed                                                                                  MAS = .85
                                  0                                                                                             0
                                                                Algorithm                                                                                 MACS = .5
                                        10K 25K 50K                                                                                   2    5    10
                                       Number of transactions                                                                        Maximum Confidence
  Figure 3. Time comparison of Apriori and                                        Figure 4. Number of rules based on maximum
  proposed algorithm for the patient dataset                                      confidence

                                2000                                                                             2000




                                                                                       Time(in seconds)
               times in secs




                                1500                                                                             1500                                     MAS = .85
                                                                MACS = .1
                                1000                                                                             1000                                     MC = .1
                                                                MC = .05
                                 500                                                                              500
                                                                MACS = .05                                                                                MAS = .70
                                   0                                                                                0
                                                                MC = .05                                                                                  MC = .1
                                        90% 70% 50%                                                                                  3%    5% 10%
                                        minimum antecedent support                                                                        MACS

             Figure 5. Time comparison                          of   different    Figure 6. Time comparison of different maximum
             maximum antecedent supports                                          antecedent consequent supports

                                  1                                                                                             1
                                                                                                      Accuracy
    Accuracy




                                                                MACS = .1                                                                                 MAS = .85
                                 0.5                                                                                           0.5                        MCS = .05
                                                                MC = .1
                                                                MACS = .1                                                                                 MAS = .7
                                  0                                                                                             0
                                                                MC = .05                                                                                  MCS = .1
                                       50% 70% 90%                                                                                   10% 5%     2%
                                                                                                                                     Maximum confidence
                                       minimum antecedent support
  Figure 7. Accuracy of the proposed algorithm                                    Figure 8. Accuracy of the proposed algorithm
  based on irregular metric                                                       based on maximum Confidence

     Table 1. Test result for patient dataset                                         on minimum antecedent support, maximum
Minimum antecedent support 70%            85%                                         antecedent consequent support and checking the
Maximum antecedent                10%     5%
consequent support                                                                                      -action items. Figure 4 presents if
Maximum confidence                10%     5%                                          maximum confidence (MC) increases, number of
Number of desired itemsets        49      31                                          valid rules increases. Figure 5 shows how time is
Number of Desired rules           5       3                                           varied with different minimum antecedent support
Time (Seconds)                    922.2 1634.56                                       (MAS) values for irregular rule finding algorithm.
Table 1 shows test result for patient dataset, after                                  Here we measured the performance of irregular rule
running the program of the proposed algorithm with                                    finding algorithm in terms of MAS keeping MACS,
different parameters. Second column of the table                                      MC, number of action items, number of non-action
presents the test result, where we used minimum                                       items constant. Time is not varied significantly
antecedent support of 70%, maximum antecedent                                         because MAS has no lead to reduce disk access as
consequent support of 10% and maximum                                                 the patient data set has all sizes of candidates for
confidence of 10%. 49 desired itemsets were                                           these MAS values. It has only lead to the number of
generated in total. 3 rules were discovered in total. It                              valid candidate generations and it can save some
took about 922.2013 seconds to find these rules.                                      CPU time. As it has lead to the CPU time, the three
Third column of the table presents the test result,                                   different cases take slightly different time.
where we used minimum antecedent support of 85%,                                         Figure 6 shows how time is varied with different
maximum antecedent consequent support of 5% and                                       MACS by keeping MAS , MC, number of action
maximum confidence of 5%. 31 desired itemsets                                         items, number of non-action items constant. Time is
were generated in total. 5 rules were discovered in                                   not varied significantly because MACS has no lead
total. It took about 1634.5634 seconds to find these                                  to reduce disk access as the patient data set has all
rules.                                                                                sizes of candidates for these MAS values. It has only
    Figure 3 shows Apriori has taken significant                                      lead to the number of valid candidate generations
higher time compared to the proposed algorithm. It is                                 and it can save some CPU time. As it has lead to the
because pruning in the proposed algorithm is based                                    CPU time, the three different cases take slightly
                                                                                      different time. As maximum consequent support




                                                                                 67
decreases, number of valid candidate generation                      New York, NY, USA, 1995, pp. 175 - 186.
decreases For this reason, case with 5% MACS takes              [6] A. Savasere, E. Omiecinski, and S. B. Navathe, "An
more time than case with 3% MACS and case with                       Efficient Algorithm for Mining Association Rules in
10% MACS takes more time than case with 5%                           Large Databases," in Proceedings of the 21th
MACS. Figure 7 illustrates accuracy results for our                  International Conference on Very Large Data Bases,
                                                                     1995, pp. 432 - 444.
proposed algorithm based on minimum antecedent
                                                                [7] B. Liu, W. Hsu, and Y. Ma, "Mining Association
support. The value of minimum antecedent support                     Rules with Multiple Minimum Supports.," in
for each presented result is also indicated. The figure              SIGKDD Explorations, 1999, pp. 337--341.
presents MAS has no lead in accuracy, as it is not              [8] H. Yun, D. Ha, B. Hwang, and K. H. Ryu, "Mining
used as a parameter in selecting valid candidate and                 association rules on significant rare data using relative
rules. Figure 8 illustrates accuracy results for our                 support.," Journal of Systems and Software archive,
proposed algorithm based on maximum confidence.                      vol. 67, no. 3, pp. 181 - 191, 2003.
The figure presents maximum confidence has lead in              [9] M. Hahsler, "A Model-Based Frequency Constraint
accuracy as it is used as parameter in selecting valid               for Mining Associations from Transaction Data.,"
rules. As maximum confidence decreases, accuracy                     Data Mining and Knowledge Discovery, vol. 13, no.
                                                                     2, pp. 137 - 166, 2006.
increases and the number of discovered rules
                                                                [10] L. Zhou and S. Yau, "Association rule and
decreases. It is because less confidence indicates that
                                                                     quantitative association rule mining among infrequent
antecedent and consequent occurs rarely together in                  items," in International Conference on Knowledge
the dataset.                                                         Discovery and Data Mining, San Jose, California,
                                                                     2007, pp. 156-167.
7. Conclusion                                                   [11] R. U. Kiran and P. K. Reddy, "An improved multiple
                                                                     minimum support based approach to mine rare
    Irregular patterns represent wrong decision,                     association rules.," in The IEEE Symposium on
illegal practice and variability in decision. In this                Computational Intelligence and Data Mining,
                                                                     Nashville, TN, USA, 2009, pp. 340-347.
paper, we propose a level wise search algorithm that
                                                                [12] S. Brin, R. Motwani, and C. Silverstein, "Beyond
works based on action and non-action type data to                    Market Baskets: Generalizing Association Rules to
find irregular association rule. The proposed                        Correlations," in In The Proceedings of SIGMOD,
algorithm has been applied to a real world patient                   AZ,USA, 1997, pp. 265-276.
data set. We have shown significant accuracy in the             [13] X. Wu, C. Zhang, and S. Zhang, "Efficient Mining of
output of the proposed algorithm. Although we have                   Both Positive and Negative Association Rules," ACM
used level-wise search for finding irregular patterns,               Transactions on Information Systems, vol. 22, no. 3,
each step of our algorithm is different from any other               p. 381 405, 2004.
level-wise search algorithm. Rules generation from              [14] M. L. Antonie and O. R. Zaïane, "Mining positive and
desired item sets is also different from conventional                negative association rules: an approach for confined
                                                                     rules," in Proceedings of the 8th European
association mining algorithms.
                                                                     Conference on Principles and Practice of Knowledge
                                                                     Discovery in Databases, Pisa, Italy, 2004, pp. 27 - 38.
8. References                                                   [15] P. A. Ortega, C. J. Figueroa, and G. A. Ruz, "A
                                                                     Medical Claim Fraud/Abuse Detection System based
 [1] R. Agrawal, T.                     . Swami, "Mining             on Data Mining: A Case Study in Chile," in DMIN,
     Association Rules between Sets of Items in Very                 2006, pp. 224-231.
     Large Databases," in Proceedings of the 1993 ACM           [16] W. S. Yanga and S. Y. Hwangb, "A Process-Mining
     SIGMOD international conference on Management of                Framework for the Detection of Healthcare Fraud and
     data, Washington, D.C., 1993, pp. 207-216.                      Abuse.," Expert Systems with Applications, vol. 31,
[2] R. Agrawal and R. Srikant, "Fast Algorithms for                  no. 1, p. 56 68, July 2006.
     Mining Association Rules in Large Databases," in
     Proceedings of the 20th International Conference on
     Very Large Data Bases, San Francisco, CA, USA,
     1994, pp. 487 - 499.
[3] S. Brin, R. Motwani, J. D. Ullman, and Shalom Tsur,
     "Dynamic Itemset Counting and Implication Rules for
     Market Basket Data," in Proceedings of the 1997
     ACM SIGMOD international conference on
     Management of data, Tucson, Arizona, United States,
     1997, pp. 255-264.
[4] H. Mannila, H. Toivonen, and A. I. Verkamo,
     "Efficient Algorithms for Discovering Association
     Rules," in AAAI Workshop on Knowledge Discovery
     in Databases, 1994, pp. 181-192.
[5] J. S. Park, M. S. Chen, and P. S. Yu, "An Effctive
     Hash based Algorithm for mining association rules,"
     in Prof. ACM SIGMOD Conf Management of Data,




                                                           68

Mais conteúdo relacionado

Semelhante a Mining Irregular Association Rules based on Action & Non-action Type Data

Finding Symmetric Association Rules to Support Medical Qualitative Research
Finding Symmetric Association Rules to Support Medical Qualitative ResearchFinding Symmetric Association Rules to Support Medical Qualitative Research
Finding Symmetric Association Rules to Support Medical Qualitative Researchrazanpaul
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Editor IJARCET
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Editor IJARCET
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
 
prediction using data mining.pdf
prediction using data mining.pdfprediction using data mining.pdf
prediction using data mining.pdfNavAhmed3
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methodssonangrai
 
Clustering Medical Data to Predict the Likelihood of Diseases
Clustering Medical Data to Predict the Likelihood of DiseasesClustering Medical Data to Predict the Likelihood of Diseases
Clustering Medical Data to Predict the Likelihood of Diseasesrazanpaul
 
Re-mining Positive and Negative Association Mining Results
Re-mining Positive and Negative Association Mining ResultsRe-mining Positive and Negative Association Mining Results
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
 
Interestingness Measures In Rule Mining: A Valuation
Interestingness Measures In Rule Mining: A ValuationInterestingness Measures In Rule Mining: A Valuation
Interestingness Measures In Rule Mining: A ValuationIJERA Editor
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data miningijpla
 
H0333039042
H0333039042H0333039042
H0333039042theijes
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association RulesIOSR Journals
 
Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
 
11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 

Semelhante a Mining Irregular Association Rules based on Action & Non-action Type Data (20)

Finding Symmetric Association Rules to Support Medical Qualitative Research
Finding Symmetric Association Rules to Support Medical Qualitative ResearchFinding Symmetric Association Rules to Support Medical Qualitative Research
Finding Symmetric Association Rules to Support Medical Qualitative Research
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
 
Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084Volume 2-issue-6-2081-2084
Volume 2-issue-6-2081-2084
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
 
prediction using data mining.pdf
prediction using data mining.pdfprediction using data mining.pdf
prediction using data mining.pdf
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
V34132136
V34132136V34132136
V34132136
 
Clustering Medical Data to Predict the Likelihood of Diseases
Clustering Medical Data to Predict the Likelihood of DiseasesClustering Medical Data to Predict the Likelihood of Diseases
Clustering Medical Data to Predict the Likelihood of Diseases
 
Re-mining Positive and Negative Association Mining Results
Re-mining Positive and Negative Association Mining ResultsRe-mining Positive and Negative Association Mining Results
Re-mining Positive and Negative Association Mining Results
 
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
Effectiveness of ERules in Generating Non Redundant Rule Sets in Pharmacy Dat...
 
Interestingness Measures In Rule Mining: A Valuation
Interestingness Measures In Rule Mining: A ValuationInterestingness Measures In Rule Mining: A Valuation
Interestingness Measures In Rule Mining: A Valuation
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data mining
 
H0333039042
H0333039042H0333039042
H0333039042
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
 
Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...Using rule based classifiers for the predictive analysis of breast cancer rec...
Using rule based classifiers for the predictive analysis of breast cancer rec...
 
11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...11.using rule based classifiers for the predictive analysis of breast cancer ...
11.using rule based classifiers for the predictive analysis of breast cancer ...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Mining Irregular Association Rules based on Action & Non-action Type Data

  • 1. Mining Irregular Association Rules based on Action & Non-action Type Data Razan Paul, Abu Sayed Md. Latiful Hoque Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh razanpaul@yahoo.com, asmlatifulhoque@cse.buet.ac.bd Abstract rules from desire itemsets compared to any level- wise search algorithm. The algorithm treats all the Conventional positive association rules are the items as being either actions that include decision, patterns that occur frequently together. These action and output or non-actions that include facts, patterns represent what decisions are routinely made statements and any criterion. In our problem, non- based on a set of facts. Irregular association rules action items appear very frequently in the data, while are the patterns that represent what decisions are action items rarely appear with the high frequent rarely made based on the same set of facts. Many non-action items. Negative association rules find domains like Healthcare, Banking etc need the patterns where items are conflict with each other and irregular rule to improve their system. In this paper, do not find decision, which are made rarely. Rare we propose a level wise search algorithm that works association rules are the patterns containing rare based on action and non-action type data to find items which are less frequent items and do not find irregular association rules. We have observed that decisions which are made rarely. Irregular pattern irregular association rules can be discovered represents wrong decision, illegal practice and efficiently based on action type and non-action type variability in decision. data from large database. To the best of our knowledge, there is no algorithm that can determine 2. Related Work such type of associations. Its effectiveness has been demonstrated by testing it for a real world patient A rare association rule forms with rare data items data set. or between frequent and rare data items. However, this rule does not map items that are used to make 1. Introduction decision/action to antecedent and items that represent actions or decision to consequent. Moreover it does Data mining is applied to discover new not expect antecedent to be high frequent because it information, which is hidden in the existing find rule with high confidence. The algorithms to information. One of the techniques in data mining is find these rules only assign different support based finding association rule. The first pioneering work to on items frequency. A number of approaches has mine conventional positive association rules was been proposed to mine rare associations [7-11]. In explained in [1]. In this work, they showed finding [7], minimum support is computed for each item association rule problem can be separated into two based on the frequency of the item. In [8], A fixed sub problems. After that many algorithms [1-6] have minimum support is applied to extract desired been proposed to mine positive association rules itemsets, which consist of only frequent items and efficiently. These algorithms find rules that represent relative support is applied to extract desired itemsets, decisions that occur frequently based on a set of which consist of rare items. In [9], Negative- facts. In other words, rules discovered by current Binomial distribution is applied to extract Negative- association mining algorithms [1-6] are patterns that Binomial desired itemsets. In [10], the association represent what decisions are usually made. In this rules are found by taking into consideration only problem, we need patterns that are rarely made. We infrequent items. The approach suggested in [11] have proposed a novel mining algorithm that can finds the association rules by computing item-wise efficiently discover the association rules from the minimum support. existing data that are strong candidates of variability. A negative association rule presents a relationship The algorithm uses a different candidate itemset among itemsets and states the presence of some selection process, a modified candidate generation itemsets in the absence of others. Every positive process, and a different mechanism of generating association rule P Q has three corresponding 978-1-4244-7571-1/10/$26.00 ©2010 IEEE 63
  • 2. negative association rules, P ¬Q, ¬P Q and ¬P 3. Irregular Association Rules ¬Q. To extract negative association rules, most papers employ different correlation measures Let D = t1 , t 2 , . . . , t n be a database of n between attributes [12-14]. In [13], the author transactions with a set of items I = i1 , i2 , . . . , im . proposed a level-wise search algorithm for mining Let set of action items of I be AI = ai1 , ai2 , . . . , aik both positive and negative association rules that where k is the number of action items. Let set of employs rule dependency measures. In [14], authors non-action items of I be proposed another level-wise search algorithm for NAI = nai1 , nai2 , . . . , naim k where is the simultaneously extracting positive and negative number of non- action items. For an itemset P I association rules using Pearson correlation and a transaction t in D, we say that t supports P if t coefficient. In [15] , author have proposed detection has values for all the attributes in P; for conciseness, model using multi layer perceptron neural networks we also write P t. By Dp we denote the (MLP) to detect fraud/abuse problem based on transactions that contain all attributes in P. The medical claims. It has been proposed to detect new, D unusual and known fraudulent/abusive behaviors. It support of P is computed as P = p , i.e. the works based on detection model which is very slow fraction of transactions containing P. A irregular rule and need huge memory requirement to analyze is of the form: P Q, with P NAI, Q A I, P existing large database. In [16], author used positive Q = . To hold the rule following condition must association rule to build clinical pathways, which can meet: P P or support P >= detect fraud and abuse on new data. However, this , P P , Q or support P, Q < model cannot detect fraud and abuse from the = existing large healthcare data. Our proposed P P ,Q and <= confidence where P(x) approach detects fraud and abuse from the existing P P large information. is the probability of x. Original Mapped Original Mapped Generate dictionary for value value value value each categorical attribute Headache 1 Yes 1 Fever 2 No 2 PatientActual Data Age Smoke Diagnosis Dictionary of Dictionary of ID Diagnosis attribute Smoke attribute 1020D 33 Yes Headache 1021D 63 No Fever Map to integer items using rule base and dictionaries Actual data If age <= 12 then 1 Medical If 13<=age<=60 then 2 domain If 60 <=age then 3 Patient Age Smoke Diagnosis knowledge If smoke = y then 1 ID If smoke = n then 2 1020D 2 1 1 If Sex = M then 1 1021D 3 2 2 If Sex = F then 2 Rule Base Data suitable for Knowledge Discovery Figure 1. Data transformation of medical data 4. Mapping complex medical data to map continuous numerical data to items using these mineable items developed rules. We have used domain dictionary approach to For knowledge discovery, the medical data have transform the data, for which medical domain expert to be transformed into a suitable transaction format knowledge is not applicable, to numerical form. As to discover knowledge. We have addressed the cardinality of attributes except continuous numeric problem of mapping complex medical data to items data are not high in medical domain, these attribute using domain dictionary and rule base as shown in values are mapped integer values using medical figure 1. The medical data are types of categorical, domain dictionaries. Therefore, the mapping process continuous numerical data, Boolean, interval, is divided in two phases. Phase 1: a rule base is percentage, fraction and ratio. Medical domain constructed based on the knowledge of medical expert have the knowledge of how to map ranges of domain experts and dictionaries are constructed for numerical data for each attribute to a series of items. attributes where domain expert knowledge is not For example, there are certain conventions to applicable, Phase 2: attribute values are mapped to consider a person is young, adult, or elder with integer values using the corresponding rule base and respect to age. A set of rules is created for each the dictionaries. continuous numerical attribute using the knowledge of medical domain experts. A rule engine is used to 5. The proposed algorithm 64
  • 3. General intuition of this algorithm is as follows: based on a set of lab tests with same results, if 99% 5.1. Candidate Generation doctors practice patients as disease x and 1 percent doctors practice patients as other diseases, then there The idea behind candidate generation of all level- is a strong possibility that this 1 percent doctors are wise algorithms like Apriori is based on the doing illegal practice. In other words, if consequent following simple fact: Every subset of a frequent C occurs infrequently with antecedent A and itemset is frequent so that they can reduce the number of itemsets that have to be checked. that is a strong candidate of variability. In every However, our proposed algorithm in candidate domain, there are a set of facts. Based on these facts, generation phase check this fact if the itemsets only decision and action are taken. In a rule S T, if S contains non-action items. This idea makes itemsets contains a set of facts and T contains decision or consist of both rare action items and high frequency action. Then such rules represent the decision T with non-action items. If the new candidate contains one their corresponding facts S. If S T has sufficient or more action items then it is selected as a valid support and confidence then it represents that candidate. If the new candidate contains only non- decision or action T is taken routinely based on facts action items then, it is selected as a valid candidate S. However, if S is high frequent and rule S-T has only if every subset of new candidate is frequent. very low confidence. Then it indicates based on facts This way the algorithm keeps the new candidates S any other decision instead of T is usually taken. It that have one or more action items. also indicates that the decision is exceptionally taken based on these facts. The main features of the 5.2. Candidate Selection proposed algorithm are as follows: If minimum support is only used like We have used two separate supports metrics to filter conventional association mining algorithm, out candidates. An itemset with only non-action desired itemsets that involve rarely appeared items is compared with minimum antecedent support action items with the high frequent non-action metric as non-action items can only take part in items will not be found. To find rules that antecedent part of irregular rule, which need to be involve both frequent antecedent part and rare high frequent. An itemset with one or more action consequent items, we have used two support items is compared with maximum antecedent metrics: minimum antecedent support, consequent support metric to keep rare action items maximum antecedent consequent support. with the high frequent non-action items. An itemset The proposed algorithm uses maximum with only non-action items is selected if it has confidence constraint instead of widely used support greater or equal to minimum antecedent minimum confidence constraint to form the support. An itemset with one or more action items rules. Moreover, it partitions itemsets into action is selected if it has support smaller or equal to item and non-action items instead of subset maximum antecedent consequent support. By this generation to form rules. way, itemsets are explored which has high support Rules have non-action items in the antecedent for non-action items and low support for action items and action items in the consequent. with high support non-action items. Here pruning is In candidate generation, it does not check the based mostly on minimum antecedent support, maximum antecedent consequent support and checki or more action items to keep that itemset. Let MAS is minimum antecedent support, MACS is maximum antecedent consequent support, Ij is the 5.3. Generating Association Rule itemsets of size j, Sm is the desired itemset of size m; Ck be the sets of candidates of size k. Figure 2 shows This problem needs association rules that represent the association mining algorithm for finding irregular irregular relationships between action and non-action rule. Like algorithm Apriori, our algorithm is also items that occur rarely together. For this reason, the based on level wise search. Each item consists of proposed algorithm uses maximum confidence attribute name and its value. Retrieving information constraint to form rules as it needs rule that has of a 1-itemset, we make a new 1-itemset if this 1- high support in antecedent portion and has very low itemset is not created already, otherwise update its support in itemset from which the rule is generated. support. The non-action 1-itemset is selected if it has It selects a rule if its confidence is less or equal to support greater or equal to minimum antecedent maximum confidence constraint. Moreover, it does support. The action 1-itemset is selected whatever not use subset generation to the itemsets to form support it has. By this way, 1-itemsets are explored rules. Here an itemset is partitioned into action item which have high support for antecedent items and and non-action items. Action items are for have arbitrary support for consequent items. consequent part and non-action items are for 65
  • 4. antecedent part. Here each itemset is mapped to only one rule. Algorithm: Find itemsets which consist of non-action Procedure CalculateCandidatesSupport(Ck) items with high support and action items with low 1, For each transaction t of Database support based on candidate generation. 1.1 CalculateSupportFromOneTransactionFor- Input: Database, minimum antecedent support, maximum Cadidates(Ck, t); antecedent consequent support procedure CalculateSupportFromOneTransaction Output : Itemsets which are strong candidates of variability. ForCadidates(Ck, t) 1. K=1, S = {Ø}; 1.Ct =Find the subsets of Ck which are candidate 2. Read the metadata about which attributes are action t type and which are not. 2.1 c.count++ 3. Ik = Select 1-itemsets either which consist of a Algorithm : Find Assosiation rules for Variability non-action item and has support greater or equal to Finding minimum antecedent support or which consists of Input: I (Vaiavility Itemsets), maximumConfidence an action item. Output: R ( set of rules ) 4. While(Ik { 1. R = Ø 4.1 K++; 2. For each X I 4.2 Ck = Candidate_generation(Ik-1) 2.1 Antecedent set AS = (as1, as2 n){ 4.3 CalculateCandidatesSupport(Ck) where asi X and AC(asi 4.4 Ik = SelectDesiredItemSetFromCandidates 2.2 Consequent set CS = (cs1, cs2 n){ (CK, Sk , MAS, MACS); where csi X and AC(csi 4.5 S = S U Sk 2.3 if (support (AS CS)/Support (AS)) <= 5. return S maximum confidence procedure SelectDesiredItemSetFromCandidates 2.3.1 AS CS is a valid rule. (CK , Sk , MAS, MACS) 2.3.2 R = R U (AS CS) 1. For each Itemset c Ck procedure Candidate_generation(Ik-1) 1.1 If c contains only non-action items 1.For each Itemset i1 Ik-1 1.1. 1 If c.support >= MAS 1.1 For each Itemset i2 Ik-1 1.1.2 Add it to I 1.1.1 Newcandidate, NC = Union(i1,i2); 1.2 else if c contains one or more action items 1.1.2 If Size of NC is k with non-action items. 1.1.2.1 If NC contains one or more action items 1.2.1 If c.support <= MACS 1.1.2.1.1 Add it to Ck if every subset of 1.2.2 Add it to I & Sk non-action items is frequent. 1.3 If c contains only action items 1.1.2.2 else 1.3.1 Add it to I 1.1.2.2.1 If every subset of NC is frequent 2. return I 1.1.2.2.1.1 Add it to Ck othewise remove it. 2. return Ck; Figure 2. Association mining algorithm for finding irregular rule 5.3.1. Lemma 1. Number of rules is equal to number maximum confidence support. Action item and non of desired itemsets and number of discarded rules = action item of a desired itemset is mapped to mp S where S is the number of desired itemsets. antecedent items and consequent items of a rule. So Proof: A single desired itemset consists of action every desired itemset is mapped to a single valid type items and non-action type items. Action items rule. Total rules = number of desired itemsets = S. and non-action items are mapped to consequent and Let m is the average number of distinct value, each antecedent parts respectively. Let I = { i1, i2 n} multidimensional attribute holds. P is the number of be the set of items to be mined, where items can be attributes to be mined. Number of possible different either action type or non-action type. Let AI = {ai1, rules = . Number of discarded rules = ai2 u) be the set of action items to be mined. where S is the number of desired itemsets. Let NAI= { nai1, nai2 v) be the set of non- action items to be mined. Each nai has to have 6. Results and discussion confidence greater than minimum confidence support to be included as 1- itemset and all ai are included as The experiments were done using PC with core 2 1- itemset. Let, C= {c1,c2,c3 n} be the set of duo processor with a clock rate of 1.8 GHz and 3GB candidate itemsets. A new candidate NC is added to of main memory. The operating system was C if the non-action part of NC named NCNA holds Microsoft Vista and implementation language was the following property: support (each subset of c#. We used a patient dataset to verify our method. NCNA) >= minimum antecedent support. A The dataset contains items, which are either actions candidate c is selected for rule generation if and only that include decision, diagnosis and cost or non- actions that include lab tests, any symptom of patient 66
  • 5. and any criterion of disease. Each instance represents which has 50273 instances and 514 attributes the data of one patient. We have filtered out (included 150 discrete and 364 numerical attributes). instances which has noisy or missing values. The All these data are converted into mineable items data set of interest has collected and preprocessed (integer representation) using domain dictionary and from the different local hospitals of Bangladesh, rule base. 2000 6 times in secs Number of rules 1500 A priori 4 MAS = .7 1000 MACS = .1 500 2 Proposed MAS = .85 0 0 Algorithm MACS = .5 10K 25K 50K 2 5 10 Number of transactions Maximum Confidence Figure 3. Time comparison of Apriori and Figure 4. Number of rules based on maximum proposed algorithm for the patient dataset confidence 2000 2000 Time(in seconds) times in secs 1500 1500 MAS = .85 MACS = .1 1000 1000 MC = .1 MC = .05 500 500 MACS = .05 MAS = .70 0 0 MC = .05 MC = .1 90% 70% 50% 3% 5% 10% minimum antecedent support MACS Figure 5. Time comparison of different Figure 6. Time comparison of different maximum maximum antecedent supports antecedent consequent supports 1 1 Accuracy Accuracy MACS = .1 MAS = .85 0.5 0.5 MCS = .05 MC = .1 MACS = .1 MAS = .7 0 0 MC = .05 MCS = .1 50% 70% 90% 10% 5% 2% Maximum confidence minimum antecedent support Figure 7. Accuracy of the proposed algorithm Figure 8. Accuracy of the proposed algorithm based on irregular metric based on maximum Confidence Table 1. Test result for patient dataset on minimum antecedent support, maximum Minimum antecedent support 70% 85% antecedent consequent support and checking the Maximum antecedent 10% 5% consequent support -action items. Figure 4 presents if Maximum confidence 10% 5% maximum confidence (MC) increases, number of Number of desired itemsets 49 31 valid rules increases. Figure 5 shows how time is Number of Desired rules 5 3 varied with different minimum antecedent support Time (Seconds) 922.2 1634.56 (MAS) values for irregular rule finding algorithm. Table 1 shows test result for patient dataset, after Here we measured the performance of irregular rule running the program of the proposed algorithm with finding algorithm in terms of MAS keeping MACS, different parameters. Second column of the table MC, number of action items, number of non-action presents the test result, where we used minimum items constant. Time is not varied significantly antecedent support of 70%, maximum antecedent because MAS has no lead to reduce disk access as consequent support of 10% and maximum the patient data set has all sizes of candidates for confidence of 10%. 49 desired itemsets were these MAS values. It has only lead to the number of generated in total. 3 rules were discovered in total. It valid candidate generations and it can save some took about 922.2013 seconds to find these rules. CPU time. As it has lead to the CPU time, the three Third column of the table presents the test result, different cases take slightly different time. where we used minimum antecedent support of 85%, Figure 6 shows how time is varied with different maximum antecedent consequent support of 5% and MACS by keeping MAS , MC, number of action maximum confidence of 5%. 31 desired itemsets items, number of non-action items constant. Time is were generated in total. 5 rules were discovered in not varied significantly because MACS has no lead total. It took about 1634.5634 seconds to find these to reduce disk access as the patient data set has all rules. sizes of candidates for these MAS values. It has only Figure 3 shows Apriori has taken significant lead to the number of valid candidate generations higher time compared to the proposed algorithm. It is and it can save some CPU time. As it has lead to the because pruning in the proposed algorithm is based CPU time, the three different cases take slightly different time. As maximum consequent support 67
  • 6. decreases, number of valid candidate generation New York, NY, USA, 1995, pp. 175 - 186. decreases For this reason, case with 5% MACS takes [6] A. Savasere, E. Omiecinski, and S. B. Navathe, "An more time than case with 3% MACS and case with Efficient Algorithm for Mining Association Rules in 10% MACS takes more time than case with 5% Large Databases," in Proceedings of the 21th MACS. Figure 7 illustrates accuracy results for our International Conference on Very Large Data Bases, 1995, pp. 432 - 444. proposed algorithm based on minimum antecedent [7] B. Liu, W. Hsu, and Y. Ma, "Mining Association support. The value of minimum antecedent support Rules with Multiple Minimum Supports.," in for each presented result is also indicated. The figure SIGKDD Explorations, 1999, pp. 337--341. presents MAS has no lead in accuracy, as it is not [8] H. Yun, D. Ha, B. Hwang, and K. H. Ryu, "Mining used as a parameter in selecting valid candidate and association rules on significant rare data using relative rules. Figure 8 illustrates accuracy results for our support.," Journal of Systems and Software archive, proposed algorithm based on maximum confidence. vol. 67, no. 3, pp. 181 - 191, 2003. The figure presents maximum confidence has lead in [9] M. Hahsler, "A Model-Based Frequency Constraint accuracy as it is used as parameter in selecting valid for Mining Associations from Transaction Data.," rules. As maximum confidence decreases, accuracy Data Mining and Knowledge Discovery, vol. 13, no. 2, pp. 137 - 166, 2006. increases and the number of discovered rules [10] L. Zhou and S. Yau, "Association rule and decreases. It is because less confidence indicates that quantitative association rule mining among infrequent antecedent and consequent occurs rarely together in items," in International Conference on Knowledge the dataset. Discovery and Data Mining, San Jose, California, 2007, pp. 156-167. 7. Conclusion [11] R. U. Kiran and P. K. Reddy, "An improved multiple minimum support based approach to mine rare Irregular patterns represent wrong decision, association rules.," in The IEEE Symposium on illegal practice and variability in decision. In this Computational Intelligence and Data Mining, Nashville, TN, USA, 2009, pp. 340-347. paper, we propose a level wise search algorithm that [12] S. Brin, R. Motwani, and C. Silverstein, "Beyond works based on action and non-action type data to Market Baskets: Generalizing Association Rules to find irregular association rule. The proposed Correlations," in In The Proceedings of SIGMOD, algorithm has been applied to a real world patient AZ,USA, 1997, pp. 265-276. data set. We have shown significant accuracy in the [13] X. Wu, C. Zhang, and S. Zhang, "Efficient Mining of output of the proposed algorithm. Although we have Both Positive and Negative Association Rules," ACM used level-wise search for finding irregular patterns, Transactions on Information Systems, vol. 22, no. 3, each step of our algorithm is different from any other p. 381 405, 2004. level-wise search algorithm. Rules generation from [14] M. L. Antonie and O. R. Zaïane, "Mining positive and desired item sets is also different from conventional negative association rules: an approach for confined rules," in Proceedings of the 8th European association mining algorithms. Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, 2004, pp. 27 - 38. 8. References [15] P. A. Ortega, C. J. Figueroa, and G. A. Ruz, "A Medical Claim Fraud/Abuse Detection System based [1] R. Agrawal, T. . Swami, "Mining on Data Mining: A Case Study in Chile," in DMIN, Association Rules between Sets of Items in Very 2006, pp. 224-231. Large Databases," in Proceedings of the 1993 ACM [16] W. S. Yanga and S. Y. Hwangb, "A Process-Mining SIGMOD international conference on Management of Framework for the Detection of Healthcare Fraud and data, Washington, D.C., 1993, pp. 207-216. Abuse.," Expert Systems with Applications, vol. 31, [2] R. Agrawal and R. Srikant, "Fast Algorithms for no. 1, p. 56 68, July 2006. Mining Association Rules in Large Databases," in Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1994, pp. 487 - 499. [3] S. Brin, R. Motwani, J. D. Ullman, and Shalom Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," in Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, Arizona, United States, 1997, pp. 255-264. [4] H. Mannila, H. Toivonen, and A. I. Verkamo, "Efficient Algorithms for Discovering Association Rules," in AAAI Workshop on Knowledge Discovery in Databases, 1994, pp. 181-192. [5] J. S. Park, M. S. Chen, and P. S. Yu, "An Effctive Hash based Algorithm for mining association rules," in Prof. ACM SIGMOD Conf Management of Data, 68