2. Sequence Data Sequence Database: Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B 17 2 B 21 7, 8, 1, 2 B 28 1, 6 C 14 1, 8, 7
3. Examples of Sequence Data Sequence E1 E2 E1 E3 E2 E3 E4 E2 Element (Transaction) Event (Item) Bases A,T,G,C An element of the DNA sequence DNA sequence of a particular species Genome sequences Types of alarms generated by sensors Events triggered by a sensor at time t History of events generated by a given sensor Event data Home page, index page, contact info, etc A collection of files viewed by a Web visitor after a single mouse click Browsing activity of a particular Web visitor Web Data Books, diary products, CDs, etc A set of items bought by a customer at time t Purchase history of a given customer Customer Event (Item) Element (Transaction) Sequence Sequence Database
33. Example of FreeSpan f_list = a: 4 ,b: 4 ,c: 4 ,d: 3 ,e: 3 ,f: 3 g is deleted because of support of g <2 . Example database: min support = 2 <eg(af)cbc> 40 <(ef)(ab)(df)cb> 30 <(ad)c(bc)(ae)> 20 <a(abc)(ac)d(cf)> 10 Sequence Sequence id
34.
35.
36.
37.
38.
39.
40. Example of PrefixSpan <a>-projected database By scanning <a>-projected database once, all the length-2 sequential patterns having prefix <a> can be found. <aa>:2 <ab>:4 <(ab)>:2 <ac>:4 <ad>:2 <af>:2 Recursively, patterns with prefix <a> can be partitioned into 6 subsets. => <e( a f)cbc > 40 <(ef)( a b)(df)cb > 30 <( a d)c(bc)(ae )> 20 < a (abc)(ac)d(cf) > 10 Sequence Sequence id <(_f)cbc> 40 <(_b)(df)cb> 30 <(_d)c(bc)(ae)> 20 <(abc)(ac)d(cf)> 10
41. Example of PrefixSpan (cont’d) < aa >-projected database => < ab >-projected database => Sequential patterns of <ab>-projected db: <(_c)>,<(_c)a>,<a><c> <(_f)c b c> 40 <(_b)(df)cb> 30 <(_d)c( b c)( a e)> 20 <( a b c)(ac)d(cf)> 10 Sequence Sequence id <c> 40 <(_c)(ae)> 20 <(_c)(ac)d(cf)> 10 <(_e)> 20 <(_bc)(ac)d(cf)> 10
42. Example of PrefixSpan (cont’d) <b>-projected database Sequential patterns <b> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf> => <e(af)c b c > 40 <(ef)(a b )(df)cb > 30 <(ad)c( b c)(ae )> 20 <a(a b c)(ac)d(cf)> 10 Sequence Sequence id <c> 40 <(df)cb> 30 <(_c)(ae)> 20 <(_c)(ac)d(cf)> 10