Z Score,T Score, Percential Rank and Box Plot Graph
Advanced Machine Learning Association Rule Mining
1. Introduction to Machine
Learning
Lecture 16
Advanced Topics in Association Rules Mining
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
2. Recap of Lecture 13-15
Ideas come from the market basket analysis (
y (MBA)
)
Let’s go shopping!
Milk, eggs, sugar,
bread
Milk, eggs, cereal, Eggs, sugar
bread
bd
Customer1
Customer2 Customer3
What do my customer buy? Which product are bought together?
Aim: Find associations and correlations between t e d e e t
d assoc at o s a d co e at o s bet ee the different
items that customers place in their shopping basket
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 15
Aim: Find associations between items
But wait!
There are many different diapers
Dodot, Huggies …
gg
There are many different beers:
heineken, desperados, king fisher … in bottle/can …
, p , g
Clothes
Which rule do you prefer?
diapers ⇒ beer Outwear Shirts
dodot diapers M ⇒ Dam beer in Can
Jackets Ski Pants
Which will have greater support?
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Continuing our journey through some advanced
topics in ARM
Mining frequent patterns without candidate
generation
Multiple Level AR
Sequential Pattern Mining
Quantitative association rules
Mining class association rules
Beyond support & confidence
B d t fid
Applications
Slide 4
Artificial Intelligence Machine Learning
5. Introduction to Seq. AR
So far, we have seen
,
Apriori
Fp-growth
F th
Mining multiple level AR
But none of them consider the order of transactions
However,
However is the sequence important?
Whether the hen or the egg?
Sometimes, really important
Analyze the sequence of items bought buy a customer
Web usage mining searches for navigational patterns of
users
Slide 5
Artificial Intelligence Machine Learning
6. An Example in Web Usage Mining
Web sequence: < {Homepage} {Electronics} {Computers}
{Laptops} {Sony Vaio} {Order Confirmation} {Return to Shopping} >
Slide 6
Artificial Intelligence Machine Learning
7. Definition
Defining the problem:
g p
Let I = {i1, i2, …, im} be a set of items
Sequence: A ordered li t of itemsets
S An d d list f it t
Itemset/element: A non-empty set of items X ⊆ I. We denote a
sequence s b < 1a2…ar> where ai i an it
by <a >, h is itemset, which i also
t hi h is l
called an element of s
An l
A element ( an it
t (or itemset) of a sequence is denoted by { 1, x2,
t) f id t d b {x
…, xk}, where xj ∈ I is an item
We
W assume without loss of generality th t it
ith t l f lit that items in an element
i l t
of a sequence are in lexicographic order
Slide 7
Artificial Intelligence Machine Learning
8. Definition
Defining the problem:
g p
Size: The size of a sequence is the number of elements (or
itemsets) in the seque ce
e se s) e sequence
Length: The length of a sequence is the number of items in the
seque ce
sequence
A sequence of length k is called k-sequence
A sequence s1 = 〈 1a2…ar〉 i a subsequence of another
〈a is b f th
sequence s2 = 〈b1b2…bv〉, or s2 is a supersequence of s1, if
there e st integers 1 ≤ j1 < j2 < … < jr 1 < jr ≤ v such t at a1 ⊆
t e e exist tege s suc that
r−1
bj1, a2 ⊆ bj2, …, ar ⊆ bjr. We also say that s2 contains s1
Slide 8
Artificial Intelligence Machine Learning
9. Example
Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.
{, , , , , , , , }
Sequence 〈{3}{4, 5}{8}〉 is contained in (or is a
subsequence of) 〈{6} {3 7}{9}{4 5 8}{3 8}〉
{3, 7}{9}{4, 5, 8}{3,
because {3} ⊆ {3, 7}, {4, 5} ⊆ {4, 5, 8}, and {8} ⊆ {3, 8}.
However, 〈{3}{8}〉 is not contained in 〈{3, 8}〉 or vice versa.
The size of the sequence 〈{3}{4, 5}{8}〉 is 3, and the length of
the sequence is 4
Slide 9
Artificial Intelligence Machine Learning
10. Objective
Objective of sequential pattern mining (SPM)
j q p g( )
Input: A set S of input data sequences (or sequence database)
Goal: the
G l th problem of mining sequential patterns i t fi d all th
bl f ii ti l tt is to find ll the
sequences that have a user-specified minimum support
Each
E h such sequence is called a frequent sequence, or a
h i ll d f t
sequential pattern
The support for a sequence is the fraction of total data
sequences in S that contains this sequence
Slide 10
Artificial Intelligence Machine Learning
11. Example
Customer Transaction Transaction Customer Customer Sequence
ID time (items bought) ID
1 July 20, 2005 30 1 < (30) (90)>
1 July 25, 2005 90 2 <(10 20) (30) (40 60 70)>
2 July 9, 2005
y, 10, 20
, 3 <(30 50 70)>
( )
2 July 14, 2005 30 4 <(30) (40 70) (90)>
2 July 20, 2005 40,60,70 5 <(90)>
3 July 25, 2005 30,50,70
4 July 25, 2005 30
4 July 29, 2005
y, 40, 70
,
4 August 2, 2005 90
5 July 12, 2005 90
Sequential patterns with support >25%
1-sequence < (30)> <(40)> <(70)> <(90)>
2-sequence <(30)(40)> <(30)(70)><(30)(90)><(40 70)>
3-sequence <(30) (40 70)>
Example borrowed from Bing Liu
Slide 11
Artificial Intelligence Machine Learning
12. GSP
GSP follows closely Apriori but for sequential patterns
yp q p
If a sequence S is not frequent, then none of the super-
seque ces of s eque
sequences o S is frequent
For instance, if <ab> is infrequent so do <acb> and <(ca)b>
GSP follows the next steps:
f ll th tt
Initially, every item in DB is a candidate of length-1
For each level (i.e., sequences of length-k) do
Scan database to collect support count for each candidate
sequence
Generate candidate length-(k+1) sequences from length-k
frequent sequences using Apriori
Repeat until no frequent sequence or no candidate can be
found
Strength: Candidate pruning by Apriori
Slide 12
Artificial Intelligence Machine Learning
13. The Algorithm
Does this remind you Apriori?
Slide 13
Artificial Intelligence Machine Learning
14. Quantitative AR
Transaction ID Age Married NumCars
1 23 No 1
2 25 Yes 1
3 29 No 0
4 34 Yes 2
5 38 Yes
Y 2
<Age: 30..39> and <Married: Yes> => <NumCars: 2>
Support = 40% Conf = 100%
40%,
How can we deal with these data?
Slide 14
Artificial Intelligence Machine Learning
15. Map to Boolean Values
Record Age
g Age
g Married Married NumCars NumCars
ID [20..29] [30..39] Yes No 0 1
100 1 0 0 1 0 1
200 1 0 1 0 0 1
300 1 0 0 1 1 0
400 0 1 1 0 0 0
500 0 1 1 0 0 0
Now,
Now use any system for mining boolean AR
Apriori
FP-growth
Slide 15
Artificial Intelligence Machine Learning
16. Problems with this Approach
MinSup
If number of intervals is large,
the support of a single interval
can be lower
MinConf
Information lost during partition
values into intervals.
Confidence can be lower as
number of intervals is smaller
Example
In the used partition:
<NumCars:0> ⇒ <Married:No> c=100%
But now, assume that in the partition, NumCars:0 and NumCars:1 go
to the same interval
<NumCars:0,1> ⇒ <Married:No> c=66.67%
Slide 16
Artificial Intelligence Machine Learning
17. Problems with this Approach
How we can solve this problem?
Increase the number of intervals
(to reduce information lost)
while combining adjacent ones (t i
hil bi i dj t (to increase support)
t)
ExecTime blows up as items
per record increases
ManyRules: Number of rules also blows up.
Many of them will not be interesting
Slide 17
Artificial Intelligence Machine Learning
18. Second Approach
Other solutions?
Well, the problem was that intervals were not the best ones
Let’s t t
L t’ try to create the best intervals f our d t
t th b t i t l for data
How?
Discretizing/Clustering techniques
Apply a discretizing/clustering technique to find the best
y g g
partitions
Employ those partitions
We’ll see how clustering techniques work in the next class. So,
keep this in mind and p
p pitch the p
pieces together next class!
g
Slide 18
Artificial Intelligence Machine Learning
19. Third Approach
And what if we do not map the input to a boolean
p p
space?
Create interval based association
interval-based
rules directly
So,
So decide the best interval and
and,
then, count the support
Usually,
Usually these approaches do not
provide all the association rules,
but the ones with larger support
and confidence
f
Fuzzy logics can also be applied here.
But again, we’ll see
GFS in two three lectures
Slide 19
Artificial Intelligence Machine Learning
20. Mining Class Association Rules
So far, we have seen ARM without any specific target
, yp g
It finds all possible rules that exist in data, i.e., any item can appear as
a consequent or a condition of a rule
However, what if we are interested in some specific targets?
E.g.:
Eg:
The user has a set of text documents from some known topics.
He/she wants to find out what words are associated or correlated
with each topic
So, now, we want to find:
X ⇒ y, where X ⊆ I, and y ∈ Y
The algorithms are very similar to those of ARM
We are not going to see them in class. But you have
information on the estudy
Slide 20
Artificial Intelligence Machine Learning
21. Beyond Support and Confidence
Support and Confidence are the basic measures of
pp
interestingness
But many more have been proposed during the last few
years
Slide 21
Artificial Intelligence Machine Learning
22. Some Applications
Wal-Mart has used the technique
for years to mine POS data and
arrange their store to maximize
sales from such analysis
Medical databases to discover commonly occurring diseases
amongst groups of people
Lottery results databases, to discover those lucky combinations of
L tt lt d t b t di th lk bi ti f
numbers
Slide 22
Artificial Intelligence Machine Learning
23. Some Applications
Power System Restoration
y
PSR is a multi-objective, multi-period, nonlinear, mixed integer
op
optimization p ob e with various co s a s a d
a o problem a ous constraints and
unforeseeable factors
Discovering o assoc a o s that help bu d heuristics for PSR
sco e g of associations a e p build eu s cs o S
Actions in a PSR
start_black_start_unit(x)
start black start unit(x)
energize_line(x)
pick_up_load(x)
pick up load(x)
synchronize(x,y)
connect_tie_line(x)
connect tie line(x)
crank_unit(x)
energize_busbar(x)
energize busbar(x)
Slide 23
Artificial Intelligence Machine Learning
24. Some Applications
Correlations with color, spatial relationships, etc.
From coarse to Fine Resolution mining
Slide 24
Artificial Intelligence Machine Learning
25. Next Class
Clustering
Slide 25
Artificial Intelligence Machine Learning
26. Introduction to Machine
Learning
Lecture 16
Advanced Topics in Association Rules Mining
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull