12. Markov Chain: Formal Definition
§ A Markov chain describes a discrete time
stochastic process over a set of states
S = {s1, s2, … sn}
according to a transition probability matrix P = {Pij}
§ Pij = probability of moving to state j when at
state i
§ Uses temporal ordering to estimated
relatedness
§ The future only depends on today and not the
past
12
13. The Math
§ Time Series Aggregation
<u1, m1, t1>,<u1, m2, t2>,<u2, m3, t3>, …
<u1> => <m1, t1>, <m2, t2>, <m3, t3>, …
§ Co-occurrence
n( ) = 24,000 n( ) = 30,000
§ Transition Probability
p( ) = 0.8
13
14. Baseline Implementation & Inefficiencies
§ RDBMS/DW-based § SQL Limitation
§ Stored procedures § Expensive Copy
§ Once a week § Does not exploit
(weekend) inherent parallelism
§ Does not scale well
(region, models)
§ 4B+ rows – run out of
memory/space
§ Convoluted Joins
(maintenance
nightmare!)
14
20. Markov Chain Migration Summary
RDBMS/DW Hadoop
Limited by SQL syntax and Can be arbitrarily complex
semantics
Expensive Data copy from Data copy avoided
data source to data center
Does not scale to new Scales beautifully.
models and regions
Maintenance nightmare Easy to maintain (written in
(stored procedures + high-level language e.g.
convoluted joins) Java, Pig)
Resource constraints No special handling
needed.
20
21. Other Algorithms & Challenges
Entity Forms
Star Trek strtrek, startrek, start
trek, star trek, star treck
South Park southpark, sothpark,
south parl, souh park
Doctor Who docter who, doctor wh,
docot who, doctor who:
Prison Break prision break, prison
brake, prison breal
21