SlideShare uma empresa Scribd logo
1 de 10
Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |.  Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence	s:  1 Minimum support: 3  	RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence:                              Support: 3 Subsequence:                              Support: 2 … B C
Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression  R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ...  A C A B C A ...  B ...  A A ...  ...  C ...  C A B ...  A B A C B A A A A B B C ...
Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place.  Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places.  External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination  place. External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
Conclusions:  We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions.   Around this basic mechanism we built a family of algorithms embedding different techniques.   The efficiency of our proposal has been thoroughly proven empirically.   The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.

Mais conteúdo relacionado

Mais procurados

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signedAfshanKhan51
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1Vin Voro
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problemsSampath Kumar S
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpadMedia4math
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)SRI TECHNOLOGICAL SOLUTIONS
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5Vin Voro
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016appasami
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4Vin Voro
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmMeghaj Mallick
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to DruidNavis Ryu
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015appasami
 

Mais procurados (20)

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signed
 
Propulsion ii
Propulsion iiPropulsion ii
Propulsion ii
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1
 
Mid term
Mid termMid term
Mid term
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problems
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
Cs 62
Cs 62Cs 62
Cs 62
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)
 
Sns pre sem
Sns pre semSns pre sem
Sns pre sem
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5
 
Prepostinfix
PrepostinfixPrepostinfix
Prepostinfix
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
 
Assignment2
Assignment2Assignment2
Assignment2
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4
 
Lo18
Lo18Lo18
Lo18
 
Turing machine
Turing machineTuring machine
Turing machine
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer Algorithm
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to Druid
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015
 

Destaque

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdataPranab Ghosh
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destaque (10)

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdata
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
SPADE -
SPADE - SPADE -
SPADE -
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Semelhante a Sma

Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmCSCJournals
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Beniamino Murgante
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Vishalkumarec
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd Iaetsd
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmissionPunk Pankaj
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalIJMER
 

Semelhante a Sma (20)

Er24902905
Er24902905Er24902905
Er24902905
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Lect6 csp
Lect6 cspLect6 csp
Lect6 csp
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
 
Nc2421532161
Nc2421532161Nc2421532161
Nc2421532161
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
MATEX @ DAC14
MATEX @ DAC14MATEX @ DAC14
MATEX @ DAC14
 
DC_PPT.pptx
DC_PPT.pptxDC_PPT.pptx
DC_PPT.pptx
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmission
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss Signal
 

Mais de Roberto Trasarti

Mais de Roberto Trasarti (8)

Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human MobilityPreserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
 
Cast
CastCast
Cast
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
Athena
AthenaAthena
Athena
 
K-BestMatch
K-BestMatchK-BestMatch
K-BestMatch
 
Where Next
Where NextWhere Next
Where Next
 
Daedalus
DaedalusDaedalus
Daedalus
 
ConQueSt
ConQueStConQueSt
ConQueSt
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Sma

  • 1. Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
  • 2. Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |. Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence s: 1 Minimum support: 3 RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence: Support: 3 Subsequence: Support: 2 … B C
  • 3. Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ... A C A B C A ... B ... A A ... ... C ... C A B ... A B A C B A A A A B B C ...
  • 4. Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place. Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places. External signal Example RE: A*B(B|C)D*E
  • 5. Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination place. External signal Example RE: A*B(B|C)D*E
  • 6. Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
  • 7. One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
  • 8. Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
  • 9. Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
  • 10. Conclusions: We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions. Around this basic mechanism we built a family of algorithms embedding different techniques. The efficiency of our proposal has been thoroughly proven empirically. The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.