SlideShare uma empresa Scribd logo
1 de 22
Sequence mining algorithm



           Monica Dăgădiţă
                        ISI
 Introduction
             to sequence mining
 Why sequence mining?
 Sequence mining algorithms
 SPADE
    Motivation
    Definitions and examples
    Algorithm
    Implementation




                     Data Mining   11/8/2011   2
 Aim - finding statistically relevant patterns
 between data examples where the values are
 delivered in a sequence

 Originallyintroduced for market basket
 analysis - customer behaviour predictions

2    types of sequence mining:
     string mining – biology (gene/protein sequences)
     itemset mining - marketing and CRM applications

                       Data Mining   11/8/2011   3
 Discovering   patterns:
    Bookstore: 70% of the people who buy Jane
     Austen’s “Pride and Prejudice” also buy “Emma”
     within a month
    Website: finding sequences of most frequently
     accessed pages

 Usage:
    Promotions
    Shelf placement
    Restructure the website
    Recommender systems

                     Data Mining   11/8/2011   4
 Apriori
 GSP  (Generalized Sequential Pattern)
 FreeSpan (Frequent pattern-projected
  Sequential pattern mining)
 PrefixSpan (Prefix-projected Sequential
  pattern mining)
 SPADE (Sequential PAttern Discovery using
  Equivalence classes)




                  Data Mining   11/8/2011   5
 Problems   of existing solutions
    Repeated database scans
    Complex internal data structures


 Key   features of SPADE:
    Fixed number of database scans
    Vertical id-list database format
    Decomposition of search space into smaller
     pieces – processed independently




                     Data Mining   11/8/2011      6
 Itemset:    set of m distinct items
   I = {i1, i2, …, im }
 Event: non-empty collection of items
   (i1,i2 … ik)
 Sequence : ordered list of events
  < e1 -> e2 -> … -> en >
 K-sequence : sequence with k items
  (B->AC) – 3-sequence



                  Data Mining   11/8/2011   7
 Subsequence:   given two sequences α=<a1 a2 … an>
 and β=<b1 b2 … bm>, α is called a subsequence of
 β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2
 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn

  Examples:
  1. (B->AC) is a subsequence of (AB->E->ACD)
  2. (AB->E) is not a subsequence of (ABE)




                    Data Mining   11/8/2011     8
Data Mining   11/8/2011   9
Id-lists of the most frequent items (1-sequences)




                   Data Mining   11/8/2011   10
 D->BF->A
    Step 1: D->B




    Step 2: D->BF




                     Data Mining   11/8/2011   11
 D->BF->A
    Step 3 : D->BF->A




 Not   space-efficient
    Solution: 2 columns - (sid,eid) for each sequence
    Eid – id of the sequence’s last item


                      Data Mining   11/8/2011   12
 D->BF->A   (space-efficient id-list joins)
                                                               D->B

                                                       SID       EID
                                                       1         15
                                                       1         20
                                                       4         20




                   D->BF->A                                  D->BF

             SID       EID                         SID          EID
             1         25                          1            20
             4         25                          4            20


                         Data Mining   11/8/2011                      13
 Complete   latice representation




                   Data Mining   11/8/2011   14
Data Mining   11/8/2011   15
 Decomposing  the latice => smaller pieces
 that can be solved independently

 Equivalence   classes
 2 sequences are in the same class (Θk) if they
  share a common k length prefix
 Example
   k=1 : Θ1 -> {[A],[B],[D],[F]}




                    Data Mining   11/8/2011   16
Data Mining   11/8/2011   17
Data Mining   11/8/2011   18
 SPADE(min_sup,D)
  //min_sup – minimum_support
 //D –initial dataset
 F1<- {frequent items or 1-sequences}
 F2<- {frequent 2-sequences}
 Ε <- {equivalence classes [X] Θ1 }
 for all [X] in E
   enumerate_frequent_seq([X],min_sup)




                  Data Mining   11/8/2011   19
   Enumerate_frequent_seq(S,min_sup)
      for all Ai in S
          Ti <- {}
          for all Aj in S, with j≥i
              R<- Ai v Aj (join)
              if R satisfies min_sup
                   Ti <- Ti U {R}
          end
          Enumerate_frequent_seq(Ti , min_sup) //DFS
    end
    For all non-empty Ti
      Enumerate_frequent_seq(Ti , min_sup) //BFS


                       Data Mining   11/8/2011   20
 The   R Project for Statistical Computing
    developed at Bell Laboratories (formerly
     AT&T, now Lucent Technologies) by John
     Chambers and colleagues

    Different implementation of S language

    arulesSequences package




                      Data Mining   11/8/2011   21
Data Mining   11/8/2011   22

Mais conteúdo relacionado

Mais procurados

Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AIMegha Sharma
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 

Mais procurados (20)

Text mining
Text miningText mining
Text mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Artificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain SituationsArtificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain Situations
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 

Semelhante a SPADE -

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliNETWAYS
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
 
Interval intersection
Interval intersectionInterval intersection
Interval intersectionAabida Noman
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Michael Rush
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern miningkiran said
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerliqiang xu
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipatLATIPAT
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Vincent Michel
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageMajid Abdollahi
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilab
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Marlon Dumas
 

Semelhante a SPADE - (20)

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
Cdi implementation
Cdi implementationCdi implementation
Cdi implementation
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipat
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KIT
 
SMDMS'13
SMDMS'13SMDMS'13
SMDMS'13
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

SPADE -

  • 1. Sequence mining algorithm Monica Dăgădiţă ISI
  • 2.  Introduction to sequence mining  Why sequence mining?  Sequence mining algorithms  SPADE  Motivation  Definitions and examples  Algorithm  Implementation Data Mining 11/8/2011 2
  • 3.  Aim - finding statistically relevant patterns between data examples where the values are delivered in a sequence  Originallyintroduced for market basket analysis - customer behaviour predictions 2 types of sequence mining:  string mining – biology (gene/protein sequences)  itemset mining - marketing and CRM applications Data Mining 11/8/2011 3
  • 4.  Discovering patterns:  Bookstore: 70% of the people who buy Jane Austen’s “Pride and Prejudice” also buy “Emma” within a month  Website: finding sequences of most frequently accessed pages  Usage:  Promotions  Shelf placement  Restructure the website  Recommender systems Data Mining 11/8/2011 4
  • 5.  Apriori  GSP (Generalized Sequential Pattern)  FreeSpan (Frequent pattern-projected Sequential pattern mining)  PrefixSpan (Prefix-projected Sequential pattern mining)  SPADE (Sequential PAttern Discovery using Equivalence classes) Data Mining 11/8/2011 5
  • 6.  Problems of existing solutions  Repeated database scans  Complex internal data structures  Key features of SPADE:  Fixed number of database scans  Vertical id-list database format  Decomposition of search space into smaller pieces – processed independently Data Mining 11/8/2011 6
  • 7.  Itemset: set of m distinct items I = {i1, i2, …, im }  Event: non-empty collection of items (i1,i2 … ik)  Sequence : ordered list of events < e1 -> e2 -> … -> en >  K-sequence : sequence with k items (B->AC) – 3-sequence Data Mining 11/8/2011 7
  • 8.  Subsequence: given two sequences α=<a1 a2 … an> and β=<b1 b2 … bm>, α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn  Examples: 1. (B->AC) is a subsequence of (AB->E->ACD) 2. (AB->E) is not a subsequence of (ABE) Data Mining 11/8/2011 8
  • 9. Data Mining 11/8/2011 9
  • 10. Id-lists of the most frequent items (1-sequences) Data Mining 11/8/2011 10
  • 11.  D->BF->A  Step 1: D->B  Step 2: D->BF Data Mining 11/8/2011 11
  • 12.  D->BF->A  Step 3 : D->BF->A  Not space-efficient  Solution: 2 columns - (sid,eid) for each sequence  Eid – id of the sequence’s last item Data Mining 11/8/2011 12
  • 13.  D->BF->A (space-efficient id-list joins) D->B SID EID 1 15 1 20 4 20 D->BF->A D->BF SID EID SID EID 1 25 1 20 4 25 4 20 Data Mining 11/8/2011 13
  • 14.  Complete latice representation Data Mining 11/8/2011 14
  • 15. Data Mining 11/8/2011 15
  • 16.  Decomposing the latice => smaller pieces that can be solved independently  Equivalence classes 2 sequences are in the same class (Θk) if they share a common k length prefix Example k=1 : Θ1 -> {[A],[B],[D],[F]} Data Mining 11/8/2011 16
  • 17. Data Mining 11/8/2011 17
  • 18. Data Mining 11/8/2011 18
  • 19.  SPADE(min_sup,D) //min_sup – minimum_support //D –initial dataset F1<- {frequent items or 1-sequences} F2<- {frequent 2-sequences} Ε <- {equivalence classes [X] Θ1 } for all [X] in E enumerate_frequent_seq([X],min_sup) Data Mining 11/8/2011 19
  • 20. Enumerate_frequent_seq(S,min_sup) for all Ai in S Ti <- {} for all Aj in S, with j≥i R<- Ai v Aj (join) if R satisfies min_sup Ti <- Ti U {R} end Enumerate_frequent_seq(Ti , min_sup) //DFS end For all non-empty Ti Enumerate_frequent_seq(Ti , min_sup) //BFS Data Mining 11/8/2011 20
  • 21.  The R Project for Statistical Computing  developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues  Different implementation of S language  arulesSequences package Data Mining 11/8/2011 21
  • 22. Data Mining 11/8/2011 22