SlideShare uma empresa Scribd logo
1 de 34
Data Mining:
   Concepts and Techniques
                        — Chapter 8 —
       8.3 Mining sequence patterns in transactional
                       databases

                 Jiawei Han and Micheline Kamber
                 Department of Computer Science
           University of Illinois at Urbana-Champaign
                      www.cs.uiuc.edu/~hanj
      ©2006 Jiawei Han and Micheline Kamber. All rights reserved.
April 18, 2013            Data Mining: Concepts and Techniques
                                                       1
April 18, 2013   Data Mining: Concepts and Techniques
                                              2
Chapter 8. Mining Stream, Time-
       Series, and Sequence Data

      Mining data streams

      Mining time-series data

      Mining sequence patterns in
      transactional databases
      Mining sequence patterns in biological
      data
April 18, 2013     Data Mining: Concepts and Techniques
                                                3
Sequence Databases & Sequential
                  Patterns

     Transaction databases, time-series databases vs. sequence
      databases
     Frequent patterns vs. (frequent) sequential patterns
     Applications of sequential pattern mining
         Customer shopping sequences:
           
               First buy computer, then CD-ROM, and then digital
               camera, within 3 months.
         Medical treatments, natural disasters (e.g., earthquakes),
          science & eng. processes, stocks and markets, etc.
         Telephone calling patterns, Weblog click streams
         DNA sequences and gene structures
April 18, 2013              Data Mining: Concepts and Techniques
                                                         4
What Is Sequential Pattern
                      Mining?

     Given a set of sequences, find the complete set of
      frequent subsequences

                                A sequence : < (ef) (ab) (df) c b >
   A sequence database
      SID       sequence           An element may contain a set of items.
      10    <a(abc)(ac)d(cf)>      Items within an element are unordered
                                   and we list them alphabetically.
      20     <(ad)c(bc)(ae)>
      30     <(ef)(ab)(df)cb>       <a(bc)dc> is a subsequence
      40      <eg(af)cbc>           of <a(abc)(ac)d(cf)>

       Given support threshold min_sup =2, <(ab)c> is a
       sequential pattern
April 18, 2013            Data Mining: Concepts and Techniques
                                                       5
Challenges on Sequential Pattern Mining

      A huge number of possible sequential patterns are
       hidden in databases
      A mining algorithm should
          find the complete set of patterns, when possible,
           satisfying the minimum support (frequency) threshold
          be highly efficient, scalable, involving only a small
           number of database scans
          be able to incorporate various kinds of user-specific
           constraints


April 18, 2013             Data Mining: Concepts and Techniques
                                                        6
Sequential Pattern Mining Algorithms

      Concept introduction and an initial Apriori-like algorithm
           Agrawal & Srikant. Mining sequential patterns, ICDE’95
      Apriori-based method: GSP (Generalized Sequential Patterns:
       Srikant & Agrawal @ EDBT’96)
      Pattern-growth methods: FreeSpan & PrefixSpan (Han et
       al.@KDD’00; Pei, et al.@ICDE’01)
      Vertical format-based mining: SPADE (Zaki@Machine Leanining’00)
      Constraint-based sequential pattern mining (SPIRIT: Garofalakis,
       Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02)
      Mining closed sequential patterns: CloSpan (Yan, Han & Afshar
       @SDM’03)

April 18, 2013                Data Mining: Concepts and Techniques
                                                           7
The Apriori Property of Sequential
                    Patterns

     A basic property: Apriori (Agrawal & Sirkant’94)
          If a sequence S is not frequent
          Then none of the super-sequences of S is frequent
          E.g, <hb> is infrequent  so do <hab> and <(ah)b>

      Seq. ID       Sequence
                                    Given support threshold
        10         <(bd)cb(ac)>
                                    min_sup =2
        20        <(bf)(ce)b(fg)>
        30         <(ah)(bf)abf>
        40         <(be)(ce)d>
        50       <a(bd)bcb(ade)>


April 18, 2013               Data Mining: Concepts and Techniques
                                                          8
GSP—Generalized Sequential Pattern Mining

     GSP (Generalized Sequential Pattern) mining algorithm
        proposed by Agrawal and Srikant, EDBT’96

   Outline of the method

        Initially, every item in DB is a candidate of length-1

        for each level (i.e., sequences of length-k) do

            scan database to collect support count for each

             candidate sequence
           
             generate candidate length-(k+1) sequences from
             length-k frequent sequences using Apriori
        repeat until no frequent sequence or no candidate can

         be found
   Major strength: Candidate pruning by Apriori

April 18, 2013              Data Mining: Concepts and Techniques
                                                          9
Finding Length-1 Sequential Patterns

    Examine GSP using an example
    Initial candidates: all singleton sequences
       <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h>
                                                   Cand    Sup
                                                   <a>      3
    Scan database once, count support for
     candidates                                    <b>      5
                                                    <c>     4
                       min_sup =2                  <d>      3
                       Seq. ID      Sequence
                         10       <(bd)cb(ac)>
                                                   <e>      3
                         20      <(bf)(ce)b(fg)>    <f>     2
                         30       <(ah)(bf)abf>     <g>     1
                         40        <(be)(ce)d>
                                                    <h>     1
                         50      <a(bd)bcb(ade)>


April 18, 2013           Data Mining: Concepts and Techniques
                                                      10
GSP: Generating Length-2 Candidates


                                               <a>     <b>     <c>    <d>    <e>    <f>
                                       <a>    <aa>    <ab>     <ac>   <ad>   <ae>   <af>
       51 length-2                     <b>    <ba>    <bb>     <bc>   <bd>   <be>   <bf>

       Candidates                      <c>    <ca>    <cb>     <cc>   <cd>   <ce>   <cf>
                                       <d>    <da>    <db>     <dc>   <dd>   <de>   <df>
                                       <e>    <ea>    <eb>     <ec>   <ed>   <ee>   <ef>
                                       <f>     <fa>   <fb>     <fc>   <fd>   <fe>   <ff>

        <a>    <b>      <c>      <d>          <e>      <f>
                                                                Without Apriori
 <a>          <(ab)>   <(ac)>   <(ad)>       <(ae)>   <(af)>
                                                                property,
 <b>                   <(bc)>   <(bd)>       <(be)>   <(bf)>
                                                                8*8+8*7/2=92
 <c>                            <(cd)>       <(ce)>   <(cf)>
 <d>                                         <(de)>   <(df)>
                                                                candidates
 <e>                                                  Apriori prunes
                                                      <(ef)>
 <f>
April 18, 2013                                     44.57% candidates
                                Data Mining: Concepts and Techniques
                                                             11
The GSP Mining Process


 5th scan: 1 cand. 1 length-5 seq.   <(bd)cba>         Cand. cannot
 pat.                                                  pass sup.
                                                       threshold
 4th scan: 8 cand. 6 length-4 seq.   <abba> <(bd)bc> …           Cand. not in DB at all
 pat.
 3rd scan: 47 cand. 19 length-3 seq. <abb> <aab> <aba> <baa> <bab> …
 pat. 20 cand. not in DB at all
 2nd scan: 51 cand. 19 length-2 seq.
 pat. 10 cand. not in DB at all       <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
 1st scan: 8 cand. 6 length-1 seq.
 pat.                                  <a> <b> <c> <d> <e> <f> <g> <h>
                                                 Seq. ID      Sequence
                                                   10       <(bd)cb(ac)>
                               min_sup =2
                                                   20       <(bf)(ce)b(fg)>
                                                   30       <(ah)(bf)abf>
                                                   40        <(be)(ce)d>
                                                   50      <a(bd)bcb(ade)>
April 18, 2013                       Data Mining: Concepts and Techniques
                                                                  12
Candidate Generate-and-test:
                    Drawbacks

     A huge set of candidate sequences generated.
         Especially 2-item candidate sequence.
     Multiple Scans of database needed.
         The length of each candidate grows by one at each
          database scan.
     Inefficient for mining long sequential patterns.
         A long pattern grow up from short patterns
         The number of short patterns is exponential to the
          length of mined patterns.
April 18, 2013             Data Mining: Concepts and Techniques
                                                        13
The SPADE Algorithm

     SPADE (Sequential PAttern Discovery using Equivalent
      Class) developed by Zaki 2001
     A vertical format sequential pattern mining method
     A sequence database is mapped to a large set of
         Item: <SID, EID>
     Sequential pattern mining is performed by
         growing the subsequences (patterns) one item at a time
          by Apriori candidate generation


April 18, 2013           Data Mining: Concepts and Techniques
                                                      14
The SPADE Algorithm




April 18, 2013       Data Mining: Concepts and Techniques
                                                  15
Bottlenecks of GSP and SPADE

     A huge set of candidates could be generated
         1,000 frequent length-1 sequences generate s huge number of
                                               1000 × 999
          length-2 candidates!    1000 ×1000 +            = 1,499,500
                                                   2
     Multiple scans of database in mining
     Breadth-first search
     Mining long sequential patterns
         Needs an exponential number of short candidates
                                                          100
                                                                100  100
          A length-100 sequential pattern needs 10
                                                          ∑ i 
                                                     30
      
                                                                 = 2 − 1 ≈ 1030
                                                          i =1      
                  candidate sequences!
April 18, 2013              Data Mining: Concepts and Techniques
                                                         16
Prefix and Suffix (Projection)


     <a>, <aa>, <a(ab)> and <a(abc)> are prefixes of
      sequence <a(abc)(ac)d(cf)>
     Given sequence <a(abc)(ac)d(cf)>


                 Prefix       Suffix (Prefix-Based Projection)
                 <a>                <(abc)(ac)d(cf)>
                 <aa>               <(_bc)(ac)d(cf)>
                 <ab>               <(_c)(ac)d(cf)>

April 18, 2013            Data Mining: Concepts and Techniques
                                                       17
Mining Sequential Patterns by Prefix
                 Projections

     Step 1: find length-1 sequential patterns
        <a>, <b>, <c>, <d>, <e>, <f>

     Step 2: divide search space. The complete set of seq. pat.
      can be partitioned into 6 subsets:
        The ones having prefix <a>;

        The ones having prefix <b>;
                                              SID     sequence
        …
                                              10  <a(abc)(ac)d(cf)>
        The ones having prefix <f>
                                              20   <(ad)c(bc)(ae)>
                                               30    <(ef)(ab)(df)cb>
                                               40      <eg(af)cbc>


April 18, 2013            Data Mining: Concepts and Techniques
                                                       18
Finding Seq. Patterns with Prefix
                     <a>

     Only need to consider projections w.r.t. <a>
         <a>-projected database: <(abc)(ac)d(cf)>, <(_d)c(bc)
          (ae)>, <(_b)(df)cb>, <(_f)cbc>

     Find all the length-2 seq. pat. Having prefix <a>: <aa>,
      <ab>, <(ab)>, <ac>, <ad>, <af>
         Further partition into 6 subsets       SID       sequence
              Having prefix <aa>;               10    <a(abc)(ac)d(cf)>
              …                                 20     <(ad)c(bc)(ae)>
                                                 30     <(ef)(ab)(df)cb>
           
               Having prefix <af>
                                                 40      <eg(af)cbc>

April 18, 2013              Data Mining: Concepts and Techniques
                                                         19
Completeness of PrefixSpan
                                       SDB
                                 SID          sequence
                                                                  Length-1 sequential patterns
                                  10   <a(abc)(ac)d(cf)>
                                                                  <a>, <b>, <c>, <d>, <e>, <f>
                                  20       <(ad)c(bc)(ae)>

                                  30       <(ef)(ab)(df)cb>

                                  40        <eg(af)cbc>

      Having prefix <a>                                              Having prefix <c>, …, <f>
                                  Having prefix <b>
  <a>-projected database
  <(abc)(ac)d(cf)>              Length-2 sequential
                                                              <b>-projected database       …
  <(_d)c(bc)(ae)>               patterns
  <(_b)(df)cb>                  <aa>, <ab>, <(ab)>,
  <(_f)cbc>                     <ac>, <ad>, <af>
                                                                    ……
 Having prefix <aa>   Having prefix <af>

  <aa>-proj. db   …    <af>-proj. db


April 18, 2013                    Data Mining: Concepts and Techniques
                                                               20
Efficiency of PrefixSpan

     No candidate sequence needs to be generated
     Projected databases keep shrinking
     Major cost of PrefixSpan: constructing projected
      databases
         Can be improved by pseudo-projections




April 18, 2013          Data Mining: Concepts and Techniques
                                                     21
Speed-up by Pseudo-projection

     Major cost of PrefixSpan: projection
         Postfixes of sequences often appear
          repeatedly in recursive projected databases
     When (projected) database can be held in main
      memory, use pointers to form projections
         Pointer to the sequence                   s=<a(abc)(ac)d(cf)>
         Offset of the postfix                              <a>
                                    s|<a>: ( , 2)   <(abc)(ac)d(cf)>
                                                             <ab>
                                  s|<ab>: ( , 4) <(_c)(ac)d(cf)>
April 18, 2013             Data Mining: Concepts and Techniques
                                                        22
Pseudo-Projection vs. Physical Projection

     Pseudo-projection avoids physically copying postfixes
         Efficient in running time and space when database
          can be held in main memory
     However, it is not efficient when database cannot fit in
      main memory
         Disk-based random accessing is very costly
     Suggested Approach:
         Integration of physical and pseudo-projection
         Swapping to pseudo-projection when the data set
          fits in memory

April 18, 2013           Data Mining: Concepts and Techniques
                                                      23
Performance on Data Set
                        C10T8S8I8




April 18, 2013         Data Mining: Concepts and Techniques
                                                    24
Performance on Data Set Gazelle




April 18, 2013   Data Mining: Concepts and Techniques
                                              25
Effect of Pseudo-Projection




April 18, 2013     Data Mining: Concepts and Techniques
                                                26
CloSpan: Mining Closed Sequential
                   Patterns
    A closed sequential pattern s:
     there exists no superpattern
     s’ such that s’ ‫ כ‬s, and s’ and
     s have the same support
    Motivation: reduces the
     number of (redundant)
     patterns but attains the same
     expressive power
  Using Backward Subpattern
   and Backward Superpattern
   pruning to prune redundant
   search space
April 18, 2013         Data Mining: Concepts and Techniques
                                                    27
CloSpan: Performance Comparison with
                  PrefixSpan




April 18, 2013   Data Mining: Concepts and Techniques
                                              28
Constraint-Based Seq.-Pattern Mining
     Constraint-based sequential pattern mining
        Constraints: User-specified, for focused mining of desired

         patterns
        How to explore efficient mining with constraints? —

         Optimization
     Classification of constraints
        Anti-monotone: E.g., value_sum(S) < 150, min(S) > 10

        Monotone: E.g., count (S) > 5, S ⊇ {PC, digital_camera}

        Succinct: E.g., length(S) ≥ 10, S ∈ {Pentium, MS/Office,

         MS/Money}
        Convertible: E.g., value_avg(S) < 25, profit_sum (S) >

         160, max(S)/avg(S) < 2, median(S) – min(S) > 5
        Inconvertible: E.g., avg(S) – median(S) = 0


April 18, 2013           Data Mining: Concepts and Techniques
                                                      29
From Sequential Patterns to Structured Patterns

     Sets, sequences, trees, graphs, and other structures
        Transaction DB: Sets of items

           {{i , i , …, i }, …}
               1 2        m
         Seq. DB: Sequences of sets:
            {<{i , i }, …, {i , i , i }>, …}
                 1 2          m n k
         Sets of Sequences:
            {{<i , i >, …, <i , i , i >}, …}
                 1 2          m n k

         Sets of trees: {t1, t2, …, tn}
         Sets of graphs (mining for frequent subgraphs):
            {g , g , …, g }
               1   2      n
     Mining structured patterns in XML documents, bio-
April 18, 2013 structures, etc.
      chemical           Data Mining: Concepts and Techniques
                                                      30
Episodes and Episode Pattern Mining

     Other methods for specifying the kinds of patterns
         Serial episodes: A → B
         Parallel episodes: A & B
         Regular expressions: (A | B)C*(D → E)
     Methods for episode pattern mining
         Variations of Apriori-like algorithms, e.g., GSP
         Database projection-based pattern growth
              Similar to the frequent pattern growth without
               candidate generation
April 18, 2013              Data Mining: Concepts and Techniques
                                                         31
Periodicity Analysis
     Periodicity is everywhere: tides, seasons, daily power
      consumption, etc.
     Full periodicity
        Every point in time contributes (precisely or

         approximately) to the periodicity
     Partial periodicit: A more general notion
        Only some segments contribute to the periodicity

            Jim reads NY Times 7:00-7:30 am every week day

     Cyclic association rules
        Associations which form cycles

     Methods
        Full periodicity: FFT, other statistical analysis methods

        Partial and cyclic periodicity: Variations of Apriori-like

         mining methods

April 18, 2013            Data Mining: Concepts and Techniques
                                                       32
Ref: Mining Sequential
                        Patterns
     R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance
      improvements. EDBT’96.
     H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event
      sequences. DAMI:97.
     M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine
      Learning, 2001.
     J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential
      Patterns Efficiently by Prefix-Projected Pattern Growth. ICDE'01 (TKDE’04).
     J. Pei, J. Han and W. Wang, Constraint-Based Sequential Pattern Mining in Large
      Databases, CIKM'02.
     X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large
      Datasets. SDM'03.
     J. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences, ICDE'04.
     H. Cheng, X. Yan, and J. Han, IncSpan: Incremental Mining of Sequential Patterns in
      Large Database, KDD'04.
     J. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series
      Database, ICDE'99.
     J. Yang, W. Wang, and P. S. Yu, Mining asynchronous periodic patterns in time series
      data, KDD'00.
April 18, 2013                     Data Mining: Concepts and Techniques
                                                                33
April 18, 2013   Data Mining: Concepts and Techniques
                                              34

Mais conteúdo relacionado

Mais procurados

Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsNiloy Sikder
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesHouw Liong The
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 Salah Amean
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmdeepti92pawar
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 

Mais procurados (20)

12 outlier
12 outlier12 outlier
12 outlier
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & Systems
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Kdd process
Kdd processKdd process
Kdd process
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 

Destaque

Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining techniquePawneshwar Datt Rai
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.SlideTeam.net
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Destaque (7)

Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.Data mining process powerpoint ppt slides.
Data mining process powerpoint ppt slides.
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Semelhante a Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotSequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotIOSR Journals
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalAlexander Decker
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern miningkiran said
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
 
Cvpr2010 open source vision software, intro and training part v open cv and r...
Cvpr2010 open source vision software, intro and training part v open cv and r...Cvpr2010 open source vision software, intro and training part v open cv and r...
Cvpr2010 open source vision software, intro and training part v open cv and r...zukun
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud confamarsri
 
Is424 g1 t9_proposal_slides
Is424 g1 t9_proposal_slidesIs424 g1 t9_proposal_slides
Is424 g1 t9_proposal_slidesJing WANG
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 

Semelhante a Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber (20)

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotSequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap Shot
 
My6asso
My6assoMy6asso
My6asso
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
C034011016
C034011016C034011016
C034011016
 
data_mining.pptx
data_mining.pptxdata_mining.pptx
data_mining.pptx
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
05
0505
05
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
 
Cvpr2010 open source vision software, intro and training part v open cv and r...
Cvpr2010 open source vision software, intro and training part v open cv and r...Cvpr2010 open source vision software, intro and training part v open cv and r...
Cvpr2010 open source vision software, intro and training part v open cv and r...
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
Is424 g1 t9_proposal_slides
Is424 g1 t9_proposal_slidesIs424 g1 t9_proposal_slides
Is424 g1 t9_proposal_slides
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 

Mais de error007

Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 

Mais de error007 (6)

Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

  • 1. Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber. All rights reserved. April 18, 2013 Data Mining: Concepts and Techniques 1
  • 2. April 18, 2013 Data Mining: Concepts and Techniques 2
  • 3. Chapter 8. Mining Stream, Time- Series, and Sequence Data Mining data streams Mining time-series data Mining sequence patterns in transactional databases Mining sequence patterns in biological data April 18, 2013 Data Mining: Concepts and Techniques 3
  • 4. Sequence Databases & Sequential Patterns  Transaction databases, time-series databases vs. sequence databases  Frequent patterns vs. (frequent) sequential patterns  Applications of sequential pattern mining  Customer shopping sequences:  First buy computer, then CD-ROM, and then digital camera, within 3 months.  Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc.  Telephone calling patterns, Weblog click streams  DNA sequences and gene structures April 18, 2013 Data Mining: Concepts and Techniques 4
  • 5. What Is Sequential Pattern Mining?  Given a set of sequences, find the complete set of frequent subsequences A sequence : < (ef) (ab) (df) c b > A sequence database SID sequence An element may contain a set of items. 10 <a(abc)(ac)d(cf)> Items within an element are unordered and we list them alphabetically. 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> <a(bc)dc> is a subsequence 40 <eg(af)cbc> of <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern April 18, 2013 Data Mining: Concepts and Techniques 5
  • 6. Challenges on Sequential Pattern Mining  A huge number of possible sequential patterns are hidden in databases  A mining algorithm should  find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold  be highly efficient, scalable, involving only a small number of database scans  be able to incorporate various kinds of user-specific constraints April 18, 2013 Data Mining: Concepts and Techniques 6
  • 7. Sequential Pattern Mining Algorithms  Concept introduction and an initial Apriori-like algorithm  Agrawal & Srikant. Mining sequential patterns, ICDE’95  Apriori-based method: GSP (Generalized Sequential Patterns: Srikant & Agrawal @ EDBT’96)  Pattern-growth methods: FreeSpan & PrefixSpan (Han et al.@KDD’00; Pei, et al.@ICDE’01)  Vertical format-based mining: SPADE (Zaki@Machine Leanining’00)  Constraint-based sequential pattern mining (SPIRIT: Garofalakis, Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02)  Mining closed sequential patterns: CloSpan (Yan, Han & Afshar @SDM’03) April 18, 2013 Data Mining: Concepts and Techniques 7
  • 8. The Apriori Property of Sequential Patterns  A basic property: Apriori (Agrawal & Sirkant’94)  If a sequence S is not frequent  Then none of the super-sequences of S is frequent  E.g, <hb> is infrequent  so do <hab> and <(ah)b> Seq. ID Sequence Given support threshold 10 <(bd)cb(ac)> min_sup =2 20 <(bf)(ce)b(fg)> 30 <(ah)(bf)abf> 40 <(be)(ce)d> 50 <a(bd)bcb(ade)> April 18, 2013 Data Mining: Concepts and Techniques 8
  • 9. GSP—Generalized Sequential Pattern Mining  GSP (Generalized Sequential Pattern) mining algorithm  proposed by Agrawal and Srikant, EDBT’96  Outline of the method  Initially, every item in DB is a candidate of length-1  for each level (i.e., sequences of length-k) do  scan database to collect support count for each candidate sequence  generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori  repeat until no frequent sequence or no candidate can be found  Major strength: Candidate pruning by Apriori April 18, 2013 Data Mining: Concepts and Techniques 9
  • 10. Finding Length-1 Sequential Patterns  Examine GSP using an example  Initial candidates: all singleton sequences  <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h> Cand Sup <a> 3  Scan database once, count support for candidates <b> 5 <c> 4 min_sup =2 <d> 3 Seq. ID Sequence 10 <(bd)cb(ac)> <e> 3 20 <(bf)(ce)b(fg)> <f> 2 30 <(ah)(bf)abf> <g> 1 40 <(be)(ce)d> <h> 1 50 <a(bd)bcb(ade)> April 18, 2013 Data Mining: Concepts and Techniques 10
  • 11. GSP: Generating Length-2 Candidates <a> <b> <c> <d> <e> <f> <a> <aa> <ab> <ac> <ad> <ae> <af> 51 length-2 <b> <ba> <bb> <bc> <bd> <be> <bf> Candidates <c> <ca> <cb> <cc> <cd> <ce> <cf> <d> <da> <db> <dc> <dd> <de> <df> <e> <ea> <eb> <ec> <ed> <ee> <ef> <f> <fa> <fb> <fc> <fd> <fe> <ff> <a> <b> <c> <d> <e> <f> Without Apriori <a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)> property, <b> <(bc)> <(bd)> <(be)> <(bf)> 8*8+8*7/2=92 <c> <(cd)> <(ce)> <(cf)> <d> <(de)> <(df)> candidates <e> Apriori prunes <(ef)> <f> April 18, 2013 44.57% candidates Data Mining: Concepts and Techniques 11
  • 12. The GSP Mining Process 5th scan: 1 cand. 1 length-5 seq. <(bd)cba> Cand. cannot pat. pass sup. threshold 4th scan: 8 cand. 6 length-4 seq. <abba> <(bd)bc> … Cand. not in DB at all pat. 3rd scan: 47 cand. 19 length-3 seq. <abb> <aab> <aba> <baa> <bab> … pat. 20 cand. not in DB at all 2nd scan: 51 cand. 19 length-2 seq. pat. 10 cand. not in DB at all <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)> 1st scan: 8 cand. 6 length-1 seq. pat. <a> <b> <c> <d> <e> <f> <g> <h> Seq. ID Sequence 10 <(bd)cb(ac)> min_sup =2 20 <(bf)(ce)b(fg)> 30 <(ah)(bf)abf> 40 <(be)(ce)d> 50 <a(bd)bcb(ade)> April 18, 2013 Data Mining: Concepts and Techniques 12
  • 13. Candidate Generate-and-test: Drawbacks  A huge set of candidate sequences generated.  Especially 2-item candidate sequence.  Multiple Scans of database needed.  The length of each candidate grows by one at each database scan.  Inefficient for mining long sequential patterns.  A long pattern grow up from short patterns  The number of short patterns is exponential to the length of mined patterns. April 18, 2013 Data Mining: Concepts and Techniques 13
  • 14. The SPADE Algorithm  SPADE (Sequential PAttern Discovery using Equivalent Class) developed by Zaki 2001  A vertical format sequential pattern mining method  A sequence database is mapped to a large set of  Item: <SID, EID>  Sequential pattern mining is performed by  growing the subsequences (patterns) one item at a time by Apriori candidate generation April 18, 2013 Data Mining: Concepts and Techniques 14
  • 15. The SPADE Algorithm April 18, 2013 Data Mining: Concepts and Techniques 15
  • 16. Bottlenecks of GSP and SPADE  A huge set of candidates could be generated  1,000 frequent length-1 sequences generate s huge number of 1000 × 999 length-2 candidates! 1000 ×1000 + = 1,499,500 2  Multiple scans of database in mining  Breadth-first search  Mining long sequential patterns  Needs an exponential number of short candidates 100  100  100 A length-100 sequential pattern needs 10 ∑ i  30    = 2 − 1 ≈ 1030 i =1   candidate sequences! April 18, 2013 Data Mining: Concepts and Techniques 16
  • 17. Prefix and Suffix (Projection)  <a>, <aa>, <a(ab)> and <a(abc)> are prefixes of sequence <a(abc)(ac)d(cf)>  Given sequence <a(abc)(ac)d(cf)> Prefix Suffix (Prefix-Based Projection) <a> <(abc)(ac)d(cf)> <aa> <(_bc)(ac)d(cf)> <ab> <(_c)(ac)d(cf)> April 18, 2013 Data Mining: Concepts and Techniques 17
  • 18. Mining Sequential Patterns by Prefix Projections  Step 1: find length-1 sequential patterns  <a>, <b>, <c>, <d>, <e>, <f>  Step 2: divide search space. The complete set of seq. pat. can be partitioned into 6 subsets:  The ones having prefix <a>;  The ones having prefix <b>; SID sequence  … 10 <a(abc)(ac)d(cf)>  The ones having prefix <f> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> April 18, 2013 Data Mining: Concepts and Techniques 18
  • 19. Finding Seq. Patterns with Prefix <a>  Only need to consider projections w.r.t. <a>  <a>-projected database: <(abc)(ac)d(cf)>, <(_d)c(bc) (ae)>, <(_b)(df)cb>, <(_f)cbc>  Find all the length-2 seq. pat. Having prefix <a>: <aa>, <ab>, <(ab)>, <ac>, <ad>, <af>  Further partition into 6 subsets SID sequence  Having prefix <aa>; 10 <a(abc)(ac)d(cf)>  … 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb>  Having prefix <af> 40 <eg(af)cbc> April 18, 2013 Data Mining: Concepts and Techniques 19
  • 20. Completeness of PrefixSpan SDB SID sequence Length-1 sequential patterns 10 <a(abc)(ac)d(cf)> <a>, <b>, <c>, <d>, <e>, <f> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> Having prefix <a> Having prefix <c>, …, <f> Having prefix <b> <a>-projected database <(abc)(ac)d(cf)> Length-2 sequential <b>-projected database … <(_d)c(bc)(ae)> patterns <(_b)(df)cb> <aa>, <ab>, <(ab)>, <(_f)cbc> <ac>, <ad>, <af> …… Having prefix <aa> Having prefix <af> <aa>-proj. db … <af>-proj. db April 18, 2013 Data Mining: Concepts and Techniques 20
  • 21. Efficiency of PrefixSpan  No candidate sequence needs to be generated  Projected databases keep shrinking  Major cost of PrefixSpan: constructing projected databases  Can be improved by pseudo-projections April 18, 2013 Data Mining: Concepts and Techniques 21
  • 22. Speed-up by Pseudo-projection  Major cost of PrefixSpan: projection  Postfixes of sequences often appear repeatedly in recursive projected databases  When (projected) database can be held in main memory, use pointers to form projections  Pointer to the sequence s=<a(abc)(ac)d(cf)>  Offset of the postfix <a> s|<a>: ( , 2) <(abc)(ac)d(cf)> <ab> s|<ab>: ( , 4) <(_c)(ac)d(cf)> April 18, 2013 Data Mining: Concepts and Techniques 22
  • 23. Pseudo-Projection vs. Physical Projection  Pseudo-projection avoids physically copying postfixes  Efficient in running time and space when database can be held in main memory  However, it is not efficient when database cannot fit in main memory  Disk-based random accessing is very costly  Suggested Approach:  Integration of physical and pseudo-projection  Swapping to pseudo-projection when the data set fits in memory April 18, 2013 Data Mining: Concepts and Techniques 23
  • 24. Performance on Data Set C10T8S8I8 April 18, 2013 Data Mining: Concepts and Techniques 24
  • 25. Performance on Data Set Gazelle April 18, 2013 Data Mining: Concepts and Techniques 25
  • 26. Effect of Pseudo-Projection April 18, 2013 Data Mining: Concepts and Techniques 26
  • 27. CloSpan: Mining Closed Sequential Patterns  A closed sequential pattern s: there exists no superpattern s’ such that s’ ‫ כ‬s, and s’ and s have the same support  Motivation: reduces the number of (redundant) patterns but attains the same expressive power  Using Backward Subpattern and Backward Superpattern pruning to prune redundant search space April 18, 2013 Data Mining: Concepts and Techniques 27
  • 28. CloSpan: Performance Comparison with PrefixSpan April 18, 2013 Data Mining: Concepts and Techniques 28
  • 29. Constraint-Based Seq.-Pattern Mining  Constraint-based sequential pattern mining  Constraints: User-specified, for focused mining of desired patterns  How to explore efficient mining with constraints? — Optimization  Classification of constraints  Anti-monotone: E.g., value_sum(S) < 150, min(S) > 10  Monotone: E.g., count (S) > 5, S ⊇ {PC, digital_camera}  Succinct: E.g., length(S) ≥ 10, S ∈ {Pentium, MS/Office, MS/Money}  Convertible: E.g., value_avg(S) < 25, profit_sum (S) > 160, max(S)/avg(S) < 2, median(S) – min(S) > 5  Inconvertible: E.g., avg(S) – median(S) = 0 April 18, 2013 Data Mining: Concepts and Techniques 29
  • 30. From Sequential Patterns to Structured Patterns  Sets, sequences, trees, graphs, and other structures  Transaction DB: Sets of items  {{i , i , …, i }, …} 1 2 m  Seq. DB: Sequences of sets:  {<{i , i }, …, {i , i , i }>, …} 1 2 m n k  Sets of Sequences:  {{<i , i >, …, <i , i , i >}, …} 1 2 m n k  Sets of trees: {t1, t2, …, tn}  Sets of graphs (mining for frequent subgraphs):  {g , g , …, g } 1 2 n  Mining structured patterns in XML documents, bio- April 18, 2013 structures, etc. chemical Data Mining: Concepts and Techniques 30
  • 31. Episodes and Episode Pattern Mining  Other methods for specifying the kinds of patterns  Serial episodes: A → B  Parallel episodes: A & B  Regular expressions: (A | B)C*(D → E)  Methods for episode pattern mining  Variations of Apriori-like algorithms, e.g., GSP  Database projection-based pattern growth  Similar to the frequent pattern growth without candidate generation April 18, 2013 Data Mining: Concepts and Techniques 31
  • 32. Periodicity Analysis  Periodicity is everywhere: tides, seasons, daily power consumption, etc.  Full periodicity  Every point in time contributes (precisely or approximately) to the periodicity  Partial periodicit: A more general notion  Only some segments contribute to the periodicity  Jim reads NY Times 7:00-7:30 am every week day  Cyclic association rules  Associations which form cycles  Methods  Full periodicity: FFT, other statistical analysis methods  Partial and cyclic periodicity: Variations of Apriori-like mining methods April 18, 2013 Data Mining: Concepts and Techniques 32
  • 33. Ref: Mining Sequential Patterns  R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT’96.  H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. DAMI:97.  M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 2001.  J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. ICDE'01 (TKDE’04).  J. Pei, J. Han and W. Wang, Constraint-Based Sequential Pattern Mining in Large Databases, CIKM'02.  X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large Datasets. SDM'03.  J. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences, ICDE'04.  H. Cheng, X. Yan, and J. Han, IncSpan: Incremental Mining of Sequential Patterns in Large Database, KDD'04.  J. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE'99.  J. Yang, W. Wang, and P. S. Yu, Mining asynchronous periodic patterns in time series data, KDD'00. April 18, 2013 Data Mining: Concepts and Techniques 33
  • 34. April 18, 2013 Data Mining: Concepts and Techniques 34