SlideShare uma empresa Scribd logo
1 de 92
Baixar para ler offline
Numberofcombinations
Number of items in set
8
256
Numberofcombinations
Number of items in set
8 20
256
1,048,576
Numberofcombinations
Number of items in set
8 20 140,000
256
1,048,576
???
Theory Meets Reality
Large Scale Frequent Pattern Mining with Apache Spark in the Real World
Kexin Xie, Architect of Marketing Cloud Einstein
kexin.xie@salesforce.com, @realstraw
Wanderley Liu, Senior Data Science Engineer
wanderley.liu@salesforce.com
Marketing Cloud Einstein Journey Insights
Track the entire consumer journey
Gather online and offline interactions to stitch together a
complete view of the consumer
Discover the optimal path to conversion
Use AI to analyze all journey permutations and
automatically recommend the best channels, offers and
sequences that lead to conversion
Learn how customers are actually interacting with your brand
GA
What is
Frequent Pattern
Mining
Mine Shaft Mural Painting by Frank Wilson
a b c d e
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b 5
a, c 4
a, d 4
a, e 2
... ...
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
L1 Patterns
L2 Patterns
A-priori Principle
A Priori in Berkeley, CA
“All sub-patterns of a frequent pattern are
frequent”
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
Min Support = 4
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
Min Support = 6
item support
a 8
b 7
c 6
d 5
e 3
item support
a, b ?
a, c ?
a, d ?
a, e ?
... ...
FP-Growth
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
c 6
FP Results
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
item support
a 8
b 7
c 6
root
a: 8 b: 2
b: 5
c: 3 c: 1
c: 2
a, b, c 3
a, c 1
b, c 2
c 6
FP Results
Header Table
FP-Tree | c
a, b 3
a 1
b 2
item support
b 5
a 4
root
b: 5
a: 4
c 6
FP Results
Header Table
FP-Tree | c
a, b 3
a 1
b 2
item support
b 5
a 4
root
b: 5
a: 4
c 6
FP Results
Header Table
c 6
item support
b 5
a 4
root
b: 5
a: 4
a 4
b 5
a, b 4
FP-Tree | c
FP Results
Header Table
c 6
a, c 4
b, c 5
a, b, c 4
item support
b 5
a 4
root
b: 5
a: 4
a 4
b 5
a, b 4
FP-Tree | c
FP Results
Header Table
Scaling Up
https://www.firestock.ru/strela-na-grafike-arrow-on-the-chart/
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
User Items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
Header Table
Number of rows
Numberofitems
Number of rows
Numberofitems
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
root
a: 8 b: 2
b: 5
c: 3
d: 1
d: 1
c: 1
d: 1
e: 1
c: 2
d: 1
d: 1
e: 1
e: 1
Header Table
item support
a 8
b 7
c 6
d 5
e 3
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
Header Table
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
Header Table
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
item support
a 8
b 7
c 6
d 5
e 3 a
[a], b
[a, b], c
[a, b, c], d
[a, b, c, d], e
u-1 a, b
u-2 b, (c, d)
u-5 a, b, (c)
u-6 a, b, (c, d)
u-8 a, b, (c)
u-9 a, b, (d)
u-10 b, (c, e)
Header Table
user items
u-1 a, b
u-2 b, c, d
u-3 a, c, d, e
u-4 a, d, e
u-5 a, b, c
u-6 a, b, c, d
u-7 a
u-8 a, b, c
u-9 a, b, d
u-10 b, c, e
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
Number of rows
Numberofitems
u-1 a, b
u-2 b, (c, d)
u-5 a, b, (c)
u-6 a, b, (c, d)
u-8 a, b, (c)
u-9 a, b, (d)
u-10 b, (c, e)
u-2 b, c, (d)
u-3 a, c, (d, e)
u-5 a, b, c
u-6 a, b, c, (d)
u-8 a, b, c
u-10 b, c, (e)
u-2 b, c, d
u-3 a, c, d, (e)
u-4 a, d, (e)
u-6 a, b, c, d
u-9 a, b, d
u-3 a, c, d, e
u-4 a, d, e
u-10 b, c, e
Distribute rows to executors
Build FP-Trees on each node
and mine for patterns
Collect patterns
Build FP-tree header table
Distribute rows to executors
Build FP-Trees on each node
and mine for patterns
Collect patterns
val headerTable = data
.flatMap(_.items.map(_ -> 1L))
.reduceByKey(_ + _)
.filter(isFrequent)
.collect
.sorted
data
.flatMap(filterDataBasedHeaderTable (headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows)
}
.collect // If necessary
Build FP-tree header table
Minimum support
https://www.maxpixel.net/static/photo/1x/Cogs-Gears-Technical-Wh
eel-Cogwheel-Gearwheel-2279289.jpg
Differential Minimum Support (DMS)
Classify Items Into
Categories
Compute Min Support
Per Category
Run FP with Multiple
Min Supports
COMMON ITEMS
RARE ITEMS
Pattern Frequency Test
CONDITION 1: Pattern Support ≥ Pattern Min Support
Pattern min support is defined as the lowest category minsup, given all items in the pattern
CONDITION 2 - Apriori Principle (Recursive)
If a pattern is frequent, all sub-patterns must be frequent
Condition 1: Pattern Support > Pattern Minimum Support
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k
A C 4k
B C 3k
A B C 2k
Condition 1: Pattern Support > Pattern Minimum Support
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
Condition 1: Pattern support > Lowest minsup given all items in the pattern
Pattern Frequency
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 1
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
Condition 2 - A priori principle
Pattern Frequency Test
Item Cat Minsup Condition 1
A Common 100k
B Common 100k
C Rare 1k
Pattern Support Minsup Condition 2
A B 80k 100k
A C 4k 1k
B C 3k 1k
A B C 2k 1k
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows)
}
val catMinsupMap = sc.broadcast( computeCatMinSup (data))
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows, catMinsupMap.value )
}
CONDITION 1
val catMinsupMap = sc.broadcast( computeCatMinSup (data))
val fpTreeResults = data
.flatMap(filterDataBasedHeaderTable(headerTable))
.groupByKey
.flatMap { case (k, rows) =>
mineForPatternsFor (k, rows, catMinsupMap.value )
}
val patternsMap = sc.broadcast(fpTreeResults.keys.collect)
fpTreeResults
.filter { case (pattern, support) =>
pattern.subsets.subsetOf (patternMap.value)
}
CONDITION 1
CONDITION 2
Not the end of the story ...
https://w-dog.net/wallpaper/nature-night-star-tree-trees-stars-background-wal
lpaper-widescreen-full-screen-hd-wallpapers-fullscreen/id/308950/
Low Level Optimization
• Handled case where array length > Integer.MAX_VALUE
Result Set Compaction
• Remove redundant and noisy result sets
• Very efficient compaction - 95% without loss of information
Result Set Ranking
• Score patterns with multiple criteria
Items with Feature Set
• Not only which combinations work best, but what makes them work best
• Well received feature, direct feedback on strategy
Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu
Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu

Mais conteúdo relacionado

Mais de Databricks

Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

Mais de Databricks (20)

Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 

Último

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Último (20)

Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

Theory Meets Reality—Large Scale Frequent Pattern Mining with Apache Spark in the Real World with Kexin Xie and Wanderley Liu

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 10. Numberofcombinations Number of items in set 8 20 256 1,048,576
  • 11. Numberofcombinations Number of items in set 8 20 140,000 256 1,048,576 ???
  • 12.
  • 13. Theory Meets Reality Large Scale Frequent Pattern Mining with Apache Spark in the Real World Kexin Xie, Architect of Marketing Cloud Einstein kexin.xie@salesforce.com, @realstraw Wanderley Liu, Senior Data Science Engineer wanderley.liu@salesforce.com
  • 14. Marketing Cloud Einstein Journey Insights Track the entire consumer journey Gather online and offline interactions to stitch together a complete view of the consumer Discover the optimal path to conversion Use AI to analyze all journey permutations and automatically recommend the best channels, offers and sequences that lead to conversion Learn how customers are actually interacting with your brand GA
  • 15. What is Frequent Pattern Mining Mine Shaft Mural Painting by Frank Wilson
  • 16.
  • 17. a b c d e
  • 18. User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 19. item support a 8 b 7 c 6 d 5 e 3 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 20. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 21. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 22. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 23. item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b 5 a, c 4 a, d 4 a, e 2 ... ... User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e L1 Patterns L2 Patterns
  • 24. A-priori Principle A Priori in Berkeley, CA “All sub-patterns of a frequent pattern are frequent”
  • 25. Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ...
  • 26. Min Support = 4 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ... Min Support = 6 item support a 8 b 7 c 6 d 5 e 3 item support a, b ? a, c ? a, d ? a, e ? ... ...
  • 28. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 29. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 30. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 31. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 32.
  • 33. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 34.
  • 35. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 37. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 38. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 39. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 40. item support a 8 b 7 c 6 root a: 8 b: 2 b: 5 c: 3 c: 1 c: 2 a, b, c 3 a, c 1 b, c 2 c 6 FP Results Header Table
  • 41. FP-Tree | c a, b 3 a 1 b 2 item support b 5 a 4 root b: 5 a: 4 c 6 FP Results Header Table
  • 42. FP-Tree | c a, b 3 a 1 b 2 item support b 5 a 4 root b: 5 a: 4 c 6 FP Results Header Table
  • 43.
  • 44. c 6 item support b 5 a 4 root b: 5 a: 4 a 4 b 5 a, b 4 FP-Tree | c FP Results Header Table
  • 45. c 6 a, c 4 b, c 5 a, b, c 4 item support b 5 a 4 root b: 5 a: 4 a 4 b 5 a, b 4 FP-Tree | c FP Results Header Table
  • 47.
  • 48. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 User Items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e Header Table
  • 51. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 52. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 53. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 54. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 55. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 56. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 57. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 58. item support a 8 b 7 c 6 d 5 e 3 root a: 8 b: 2 b: 5 c: 3 d: 1 d: 1 c: 1 d: 1 e: 1 c: 2 d: 1 d: 1 e: 1 e: 1 Header Table
  • 59. item support a 8 b 7 c 6 d 5 e 3 user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e Header Table
  • 60. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 61. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e
  • 62. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 63. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d Header Table u-3 a, c, d, e u-4 a, d, e u-10 b, c, e user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e)
  • 64. item support a 8 b 7 c 6 d 5 e 3 a [a], b [a, b], c [a, b, c], d [a, b, c, d], e u-1 a, b u-2 b, (c, d) u-5 a, b, (c) u-6 a, b, (c, d) u-8 a, b, (c) u-9 a, b, (d) u-10 b, (c, e) Header Table user items u-1 a, b u-2 b, c, d u-3 a, c, d, e u-4 a, d, e u-5 a, b, c u-6 a, b, c, d u-7 a u-8 a, b, c u-9 a, b, d u-10 b, c, e u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 65. Number of rows Numberofitems u-1 a, b u-2 b, (c, d) u-5 a, b, (c) u-6 a, b, (c, d) u-8 a, b, (c) u-9 a, b, (d) u-10 b, (c, e) u-2 b, c, (d) u-3 a, c, (d, e) u-5 a, b, c u-6 a, b, c, (d) u-8 a, b, c u-10 b, c, (e) u-2 b, c, d u-3 a, c, d, (e) u-4 a, d, (e) u-6 a, b, c, d u-9 a, b, d u-3 a, c, d, e u-4 a, d, e u-10 b, c, e
  • 66. Distribute rows to executors Build FP-Trees on each node and mine for patterns Collect patterns Build FP-tree header table
  • 67. Distribute rows to executors Build FP-Trees on each node and mine for patterns Collect patterns val headerTable = data .flatMap(_.items.map(_ -> 1L)) .reduceByKey(_ + _) .filter(isFrequent) .collect .sorted data .flatMap(filterDataBasedHeaderTable (headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows) } .collect // If necessary Build FP-tree header table
  • 68.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74. Differential Minimum Support (DMS) Classify Items Into Categories Compute Min Support Per Category Run FP with Multiple Min Supports
  • 75.
  • 77.
  • 78.
  • 79.
  • 80. Pattern Frequency Test CONDITION 1: Pattern Support ≥ Pattern Min Support Pattern min support is defined as the lowest category minsup, given all items in the pattern CONDITION 2 - Apriori Principle (Recursive) If a pattern is frequent, all sub-patterns must be frequent
  • 81. Condition 1: Pattern Support > Pattern Minimum Support Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k A C 4k B C 3k A B C 2k
  • 82. Condition 1: Pattern Support > Pattern Minimum Support Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 83. Condition 1: Pattern support > Lowest minsup given all items in the pattern Pattern Frequency Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 1 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 84. Condition 2 - A priori principle Pattern Frequency Test Item Cat Minsup Condition 1 A Common 100k B Common 100k C Rare 1k Pattern Support Minsup Condition 2 A B 80k 100k A C 4k 1k B C 3k 1k A B C 2k 1k
  • 85.
  • 86. val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows) }
  • 87. val catMinsupMap = sc.broadcast( computeCatMinSup (data)) val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows, catMinsupMap.value ) } CONDITION 1
  • 88. val catMinsupMap = sc.broadcast( computeCatMinSup (data)) val fpTreeResults = data .flatMap(filterDataBasedHeaderTable(headerTable)) .groupByKey .flatMap { case (k, rows) => mineForPatternsFor (k, rows, catMinsupMap.value ) } val patternsMap = sc.broadcast(fpTreeResults.keys.collect) fpTreeResults .filter { case (pattern, support) => pattern.subsets.subsetOf (patternMap.value) } CONDITION 1 CONDITION 2
  • 89. Not the end of the story ... https://w-dog.net/wallpaper/nature-night-star-tree-trees-stars-background-wal lpaper-widescreen-full-screen-hd-wallpapers-fullscreen/id/308950/
  • 90. Low Level Optimization • Handled case where array length > Integer.MAX_VALUE Result Set Compaction • Remove redundant and noisy result sets • Very efficient compaction - 95% without loss of information Result Set Ranking • Score patterns with multiple criteria Items with Feature Set • Not only which combinations work best, but what makes them work best • Well received feature, direct feedback on strategy