SlideShare uma empresa Scribd logo
1 de 19
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
A Study of Scalable Pattern Mining Algorithms
on Large Scale Interval Data
Under Supervision Of:
Dr. Dhaval Patel
CSE Department
Presented By:
Prakhar Dhama
15535029
2
Outline
• What is Pattern Mining?
• Need for Scalable Pattern Mining
• Interval-based Events
• Serial Frequent Itemset Mining
– Apriori, Eclat and FP-growth
• Parallel Itemset Mining
– FP-growth based PFP
– Ultrametric tree based FiDoop
• Pattern Mining on Interval Data
– Interval Sequences
– Temporal Relations
– Heirarchical Representation
• Conclusion and Research Gap
3
What is Pattern Mining?
• A pattern can be a set of items, ordered subsequences,
subgraphs, etc.
• Different kinds of pattern mining are
– Frequent itemset mining. finding set of items that frequently appear
together in a transactional database, such as milk and bread.
– Sequential pattern mining. finding frequently occurring
subsequence in a sequence database, such as customer buying
pattern, first a digital camera, followed by a memory card.
– Structured pattern mining. finding frequent substructures in a
spatial database such as graphs, trees, or lattices.
– Temporal pattern mining. finding relations among events in a
temporal database such as time for which iron is on and time for
which its steel base is hot.
4
Need for Scalable Pattern Mining
• The NSA Utah Data Center store data in order of exabytes.
In 2014, NSA processed 29 petabytes of data in a single day.
• With huge increase in data size, pattern mining on single
machine is infeasible.
• Solution. Modify existing pattern mining algorithms and
design scalable versions which can run on distributed
means.
• The parallel programming models are MapReduce, Bulk
Synchronous Parallel, etc.
• Some of the popular big data tools to implement parallel
algorithms are Apache Spark, Hadoop, NoSQL databases
like Cassandra, MongoDB, etc.
5
Interval-based Events Data
• In real world events, instead of being instantaneous, persist
for some duration and called interval events.
• The data including time related attribute is stored in temporal
database.
• The relation among these interval events is intrinsically
complex and point-based algorithms are not applicable.
• Applications
– power meter in house that logs household appliance electricity usage,
can be used to identify times each appliance is turned on or off.
– it has been observed that in diabetic patients, the presence of
hyperglycemia overlaps with the absence of glycosuria.
– domains such as medical, multimedia, meteorology and finance
where the events durations could play an important role.
6
Large Scale Interval Data
• Querying vs Mining. The purpose of mining is to discover
knowledge while database querying simply retrieves data.
• The only work that deals with large scale interval data is
querying quantitative analysis[1].
• All the current efforts on mining temporal relationships rely
on sequential algorithms and problem of scalable mining on
large scale interval data is not yet addressed.
• Solution. Design novel strategy to mine temporal patterns
on large scale interval data by augmenting
– Existing parallel mining algorithms for point-based events.
– Sequential pattern mining algorithms on interval data.
7
Serial Frequent Itemset Mining Methods
• Mining frequent itemset is the first step, it is followed by
another step to generate inter transaction association rules.
• Apriori. It uses bread first strategy to count support of
itemset and uses candidate generation function which
exploits downward closure property of support.
• Eclat. Equivalent Class Transformation is depth first
algorithm. It converts the transactional database to its
vertical format i.e. transaction list for each item and then
uses set intersection.
• FP-growth. It doesn’t include candidate generation, instead
use a prefix tree structure FP-tree. It uses two passes over
data set and does recursive traversal of FP-tree for each
item in itemset.
8
Parallel Itemset Mining
• Apriori-like parallel FIM algorithms such as FDM, DDM,
FPM, and MapReduce based DPC[2].
• Apriori-like solutions suffer potential problems of high I/O,
communication, and synchronization overhead, which make
it strenuous to scale up these parallel algorithms.
• Eclat-like most recent parallel algorithms include Dist-Eclat
and BigFIM[3].
• FP-growth-like parallel FIM algorithms such as and shared
memory based cache conscious FP-growth and most
popular MapReduce based PFP[4].
• Utrametric-tree based FIUT[5] and FiDoop[6].
• Others include recent lexicographical tree based Sequence
Growth[7].
9
PFP algorithm
• Popular parallel FP-growth MapReduce based algorithm.
• Includes three MapReduce phases.
Sharding and
Parallel Counting
Group-dependent
Shard FP-growth
Aggregation
• Phase 1. Sharding divides the database in
consecutive parts and stores them in different
machines. Parallel Counting does a MapReduce task
for counting the support of the items. Each mapper
works on single shard.
• Phase 2. The frequent items are dividing in groups.
The mapper for each group id as key outputs the list
of transaction ids. The reducer then creates FP-tree
for each group.
• Phase 3. For all the items the corresponding frequent
patterns are listed out of which required number of
mostly supported patterns are reported.
10
FiDoop Algorithm
• One of the recent parallel FIM algorithm outperforms Apriori-
like solution as well as FP-growth based PFP.
• Based on ultrametric tree extending FIUT.
• k-FIU-tree is built by placing all frequent itemsets of length k
starting from root to last item in itemset in a single path.
Hence, all the leaves are at same height k.
• Example.
abc 1
abd 2
acde 3
3-FIU-tree
root
a
b
c:1 d:2
itemsets
11
FiDoop Design
• Uses three MapReduce phases like PFP.
• First MapReduce Job. discovers all frequent items or
frequent one-itemsets.
• Second MapReduce Job. scans the database to generate k-
itemsets by removing infrequent items in each transaction.
• Third MapReduce Job. constructs decomposed h-FIU-tree,
2≤h≤k-1, and mines all frequent h-itemsets
Input transaction
<LongWritable offset, Text
record>
Global one-itemset
<Text item, LongWritable
count>
Pruned transaction of k-itemset
<ArrayWritable k-item,
LongWritable 1>
<IntWritable id,
MapWritable<ArrayWritable k-
item, LongWritable SUM>>
<IntWritable id,
MapWritable<ArrayWritable
k-item, LongWritable SUM>>
Frequent h-itemset from h-
FIU-tree
MapReduce MapReduce MapReduce
12
Pattern Mining on Interval Data
• Various algorithms have been proposed to discover temporal
patterns on interval data.
• Apriori-like. HDFS[8]: transforms event sequence into id-
lists and merges the id-lists iteratively, IEMiner[9]: reduce
search space and remove non promising candidates
• Pattern-growth. TPrefixSpan[10]: generates all possible
candidiates then scan the projected database recursively to
discover temporal patterns, TPMiner[11]: based on projection
database techniques and including several pruning
techniques to reduce search space.
13
Interval Sequences
• A temporal database can handle data with time. It stores all
the interval sequences.
• An interval sequence is a collection of several intervals
having start time and end time.
• Example
Db contains 4 interval sequences.
Let minimum support = 3
Temporal pattern (C=D) is frequent
with support 4
14
Temporal Relations
• Most of the pattern mining on interval data is based on 13
relations among temporal events proposed by Allen.
• Relations among two interval events X & Y is as shown
below.
15
Heirarchical Representation
• Representation should be lossless otherwise spurious frequent patterns may be
discovered such that from representation the events arrangement can be
estimated reversably.
• Lossless Heirarchical Representation
P
Q
R
R
Q
P
R
Q
P
a. Overlap count wrt R=1
Meet count wrt R=0
b. Overlap count wrt R=2
Meet count wrt R=0
c. Overlap count wrt R=1
Meet count wrt R=1
Various Interpretation of temporal pattern (P o Q) o R
• IEMiner uses 5 variables to distinguish above interpretations contain count, finish
count, meet count, overlap count, and start count in order.
a. (P o[0,0,0,1,0] Q) o[0,0,0,1,0] R
b. (P o[0,0,0,1,0] Q) o[0,0,0,2,0] R
c. (P o[0,0,0,1,0] Q) o[0,0,1,1,0] R
16
Conclusion
• The classic mining algorithms are modified to run in
distributed means on a cluster. Although much efforts are still
going on in field of pattern mining in interval data, to the best
of my knowledge one issue is not addressed anywhere.
• All the current pattern mining algorithms on interval-based
events are sequential in nature. They cannot scale to large
data set which cannot be stored in single memory. The
various parallel techniques in mining frequent patterns in
instantaneous events and current sequential techniques on
interval data can help in addressing this issue.
17
References
[1] Ruan, Guangchen, et al. 2014. Parallel and quantitative sequential pattern
mining for large-scale interval-based temporal data. IEEE International
Conference on Big Data.
[2] Lin, Hsueh, et al. 2012. Apriori-based frequent itemset mining algorithms on
MapReduce. In Proceedings of the 6th International Conference on
Ubiquitous Information Management and Communication.
[3] Moens, Aksehirli, et al. 2013. Frequent itemset mining for big data. IEEE
International Conference on Big Data.
[4] Li, Haoyuan, et al. 2008. Pfp: parallel fp-growth for query recommendation.
Proceedings of the ACM conference on Recommender systems.
[5] Tsay, Yuh-Jiuan, et al. 2009. FIUT: A new method for mining frequent
itemsets. Proceedings of Information Sciences.
[6] Xun, Yaling, et al. 2015. FiDoop: Parallel Mining of Frequent Itemsets Using
MapReduce. IEEE Transactions on Systems, Man, and Cybernetics.
[7] Liang, Yen-Hui, et al. 2015. Sequence-Growth: A Scalable and Effective
Frequent Itemset Mining Algorithm for Big Data Based on MapReduce
Framework. IEEE International Conference on Big Data.
18
References
[8] Papapetrou, Panagiotis, et al. 2005. Discovering frequent arrangements of
temporal intervals. Proceedings of Fifth IEEE International Conference on
Data Mining.
[9] Patel, et al. 2008. Mining relationships among interval events for
classification. Proceedings of the ACM SIGMOD international conference
on Management of data.
[10] Wu, Chen, et al. 2007. Mining nonambiguous temporal patterns for
interval-based events. IEEE Transactions on Knowledge and Data
Engineering.
[11] Chen, Yi-Cheng, et al. 2015. Mining Temporal Patterns in Time Interval-
based Data. IEEE Transactions on Knowledge and Data Engineering.
19
Thank You!

Mais conteúdo relacionado

Mais procurados

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
idescitation
 

Mais procurados (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
InternReport
InternReportInternReport
InternReport
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Efficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search EnginesEfficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search Engines
 
Pycon 2016-open-space
Pycon 2016-open-spacePycon 2016-open-space
Pycon 2016-open-space
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Building a PII scrubbing layer
Building a PII scrubbing layerBuilding a PII scrubbing layer
Building a PII scrubbing layer
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm trees
 
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 

Semelhante a Temporal Pattern Mining

Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
ijsrd.com
 
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduceFiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
IJCSIS Research Publications
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
WSO2
 

Semelhante a Temporal Pattern Mining (20)

Ijariie1129
Ijariie1129Ijariie1129
Ijariie1129
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
An incremental mining algorithm for maintaining sequential patterns using pre...
An incremental mining algorithm for maintaining sequential patterns using pre...An incremental mining algorithm for maintaining sequential patterns using pre...
An incremental mining algorithm for maintaining sequential patterns using pre...
 
A Survey of Sequential Rule Mining Techniques
A Survey of Sequential Rule Mining TechniquesA Survey of Sequential Rule Mining Techniques
A Survey of Sequential Rule Mining Techniques
 
Ijcatr04051004
Ijcatr04051004Ijcatr04051004
Ijcatr04051004
 
Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) over
 
J017114852
J017114852J017114852
J017114852
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern mining
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Mining
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
 
B017550814
B017550814B017550814
B017550814
 
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduceFiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
 
Fp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityFp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalability
 

Último

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Último (20)

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

Temporal Pattern Mining

  • 1. INDIAN INSTITUTE OF TECHNOLOGY ROORKEE A Study of Scalable Pattern Mining Algorithms on Large Scale Interval Data Under Supervision Of: Dr. Dhaval Patel CSE Department Presented By: Prakhar Dhama 15535029
  • 2. 2 Outline • What is Pattern Mining? • Need for Scalable Pattern Mining • Interval-based Events • Serial Frequent Itemset Mining – Apriori, Eclat and FP-growth • Parallel Itemset Mining – FP-growth based PFP – Ultrametric tree based FiDoop • Pattern Mining on Interval Data – Interval Sequences – Temporal Relations – Heirarchical Representation • Conclusion and Research Gap
  • 3. 3 What is Pattern Mining? • A pattern can be a set of items, ordered subsequences, subgraphs, etc. • Different kinds of pattern mining are – Frequent itemset mining. finding set of items that frequently appear together in a transactional database, such as milk and bread. – Sequential pattern mining. finding frequently occurring subsequence in a sequence database, such as customer buying pattern, first a digital camera, followed by a memory card. – Structured pattern mining. finding frequent substructures in a spatial database such as graphs, trees, or lattices. – Temporal pattern mining. finding relations among events in a temporal database such as time for which iron is on and time for which its steel base is hot.
  • 4. 4 Need for Scalable Pattern Mining • The NSA Utah Data Center store data in order of exabytes. In 2014, NSA processed 29 petabytes of data in a single day. • With huge increase in data size, pattern mining on single machine is infeasible. • Solution. Modify existing pattern mining algorithms and design scalable versions which can run on distributed means. • The parallel programming models are MapReduce, Bulk Synchronous Parallel, etc. • Some of the popular big data tools to implement parallel algorithms are Apache Spark, Hadoop, NoSQL databases like Cassandra, MongoDB, etc.
  • 5. 5 Interval-based Events Data • In real world events, instead of being instantaneous, persist for some duration and called interval events. • The data including time related attribute is stored in temporal database. • The relation among these interval events is intrinsically complex and point-based algorithms are not applicable. • Applications – power meter in house that logs household appliance electricity usage, can be used to identify times each appliance is turned on or off. – it has been observed that in diabetic patients, the presence of hyperglycemia overlaps with the absence of glycosuria. – domains such as medical, multimedia, meteorology and finance where the events durations could play an important role.
  • 6. 6 Large Scale Interval Data • Querying vs Mining. The purpose of mining is to discover knowledge while database querying simply retrieves data. • The only work that deals with large scale interval data is querying quantitative analysis[1]. • All the current efforts on mining temporal relationships rely on sequential algorithms and problem of scalable mining on large scale interval data is not yet addressed. • Solution. Design novel strategy to mine temporal patterns on large scale interval data by augmenting – Existing parallel mining algorithms for point-based events. – Sequential pattern mining algorithms on interval data.
  • 7. 7 Serial Frequent Itemset Mining Methods • Mining frequent itemset is the first step, it is followed by another step to generate inter transaction association rules. • Apriori. It uses bread first strategy to count support of itemset and uses candidate generation function which exploits downward closure property of support. • Eclat. Equivalent Class Transformation is depth first algorithm. It converts the transactional database to its vertical format i.e. transaction list for each item and then uses set intersection. • FP-growth. It doesn’t include candidate generation, instead use a prefix tree structure FP-tree. It uses two passes over data set and does recursive traversal of FP-tree for each item in itemset.
  • 8. 8 Parallel Itemset Mining • Apriori-like parallel FIM algorithms such as FDM, DDM, FPM, and MapReduce based DPC[2]. • Apriori-like solutions suffer potential problems of high I/O, communication, and synchronization overhead, which make it strenuous to scale up these parallel algorithms. • Eclat-like most recent parallel algorithms include Dist-Eclat and BigFIM[3]. • FP-growth-like parallel FIM algorithms such as and shared memory based cache conscious FP-growth and most popular MapReduce based PFP[4]. • Utrametric-tree based FIUT[5] and FiDoop[6]. • Others include recent lexicographical tree based Sequence Growth[7].
  • 9. 9 PFP algorithm • Popular parallel FP-growth MapReduce based algorithm. • Includes three MapReduce phases. Sharding and Parallel Counting Group-dependent Shard FP-growth Aggregation • Phase 1. Sharding divides the database in consecutive parts and stores them in different machines. Parallel Counting does a MapReduce task for counting the support of the items. Each mapper works on single shard. • Phase 2. The frequent items are dividing in groups. The mapper for each group id as key outputs the list of transaction ids. The reducer then creates FP-tree for each group. • Phase 3. For all the items the corresponding frequent patterns are listed out of which required number of mostly supported patterns are reported.
  • 10. 10 FiDoop Algorithm • One of the recent parallel FIM algorithm outperforms Apriori- like solution as well as FP-growth based PFP. • Based on ultrametric tree extending FIUT. • k-FIU-tree is built by placing all frequent itemsets of length k starting from root to last item in itemset in a single path. Hence, all the leaves are at same height k. • Example. abc 1 abd 2 acde 3 3-FIU-tree root a b c:1 d:2 itemsets
  • 11. 11 FiDoop Design • Uses three MapReduce phases like PFP. • First MapReduce Job. discovers all frequent items or frequent one-itemsets. • Second MapReduce Job. scans the database to generate k- itemsets by removing infrequent items in each transaction. • Third MapReduce Job. constructs decomposed h-FIU-tree, 2≤h≤k-1, and mines all frequent h-itemsets Input transaction <LongWritable offset, Text record> Global one-itemset <Text item, LongWritable count> Pruned transaction of k-itemset <ArrayWritable k-item, LongWritable 1> <IntWritable id, MapWritable<ArrayWritable k- item, LongWritable SUM>> <IntWritable id, MapWritable<ArrayWritable k-item, LongWritable SUM>> Frequent h-itemset from h- FIU-tree MapReduce MapReduce MapReduce
  • 12. 12 Pattern Mining on Interval Data • Various algorithms have been proposed to discover temporal patterns on interval data. • Apriori-like. HDFS[8]: transforms event sequence into id- lists and merges the id-lists iteratively, IEMiner[9]: reduce search space and remove non promising candidates • Pattern-growth. TPrefixSpan[10]: generates all possible candidiates then scan the projected database recursively to discover temporal patterns, TPMiner[11]: based on projection database techniques and including several pruning techniques to reduce search space.
  • 13. 13 Interval Sequences • A temporal database can handle data with time. It stores all the interval sequences. • An interval sequence is a collection of several intervals having start time and end time. • Example Db contains 4 interval sequences. Let minimum support = 3 Temporal pattern (C=D) is frequent with support 4
  • 14. 14 Temporal Relations • Most of the pattern mining on interval data is based on 13 relations among temporal events proposed by Allen. • Relations among two interval events X & Y is as shown below.
  • 15. 15 Heirarchical Representation • Representation should be lossless otherwise spurious frequent patterns may be discovered such that from representation the events arrangement can be estimated reversably. • Lossless Heirarchical Representation P Q R R Q P R Q P a. Overlap count wrt R=1 Meet count wrt R=0 b. Overlap count wrt R=2 Meet count wrt R=0 c. Overlap count wrt R=1 Meet count wrt R=1 Various Interpretation of temporal pattern (P o Q) o R • IEMiner uses 5 variables to distinguish above interpretations contain count, finish count, meet count, overlap count, and start count in order. a. (P o[0,0,0,1,0] Q) o[0,0,0,1,0] R b. (P o[0,0,0,1,0] Q) o[0,0,0,2,0] R c. (P o[0,0,0,1,0] Q) o[0,0,1,1,0] R
  • 16. 16 Conclusion • The classic mining algorithms are modified to run in distributed means on a cluster. Although much efforts are still going on in field of pattern mining in interval data, to the best of my knowledge one issue is not addressed anywhere. • All the current pattern mining algorithms on interval-based events are sequential in nature. They cannot scale to large data set which cannot be stored in single memory. The various parallel techniques in mining frequent patterns in instantaneous events and current sequential techniques on interval data can help in addressing this issue.
  • 17. 17 References [1] Ruan, Guangchen, et al. 2014. Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data. IEEE International Conference on Big Data. [2] Lin, Hsueh, et al. 2012. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication. [3] Moens, Aksehirli, et al. 2013. Frequent itemset mining for big data. IEEE International Conference on Big Data. [4] Li, Haoyuan, et al. 2008. Pfp: parallel fp-growth for query recommendation. Proceedings of the ACM conference on Recommender systems. [5] Tsay, Yuh-Jiuan, et al. 2009. FIUT: A new method for mining frequent itemsets. Proceedings of Information Sciences. [6] Xun, Yaling, et al. 2015. FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce. IEEE Transactions on Systems, Man, and Cybernetics. [7] Liang, Yen-Hui, et al. 2015. Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework. IEEE International Conference on Big Data.
  • 18. 18 References [8] Papapetrou, Panagiotis, et al. 2005. Discovering frequent arrangements of temporal intervals. Proceedings of Fifth IEEE International Conference on Data Mining. [9] Patel, et al. 2008. Mining relationships among interval events for classification. Proceedings of the ACM SIGMOD international conference on Management of data. [10] Wu, Chen, et al. 2007. Mining nonambiguous temporal patterns for interval-based events. IEEE Transactions on Knowledge and Data Engineering. [11] Chen, Yi-Cheng, et al. 2015. Mining Temporal Patterns in Time Interval- based Data. IEEE Transactions on Knowledge and Data Engineering.