Markov chains can take predictive theory to a new level, with large-scale applications for digital marketing. From social media network modeling to user pathing, site scoring and recommended pages, Markov chains can quantify, rank, and return likely outcomes on the web. In other words, they can demystify demographics. Here's how.
2. Rivka Fogel
Markov Chains: Probability without
History
Andrey
Markov
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 2
3. Rivka Fogel
What Are Probability Spaces?
Function/Possibility 1
Focal Object /
Function Co-Domain
Function/Possibility 2
• Also known as stochastic processes
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 3
4. Rivka Fogel
Type 1: Time Series
Function/Possibility 1
First Event
Function/Possibility 2
Also called
“states”
Time
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 4
5. Rivka Fogel
Application: Personalization
Identifying user-specific authorities
B
C
User
A
E
D
• To return more accurate SERPs (E) for that
user
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 5
6. Rivka Fogel
Type 2: Spatial Field
Shared Event
• Variable interactions are often
statistically correlated
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 6
7. Rivka Fogel
Addition of The Markov Property
The Next State Depends Only on the Current State:
A
B
C
E because of B or D,
not because of A
D
• The probability of B causing E, as opposed to D
causing E, is calculated by the Bayesian
Theorem
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 7
8. Rivka Fogel
Application: (not provided)
Model Landing
Page
Keyphrase?
Homepage
Inventory
Gallery Page
Video View
Bounce
Homepage
Video View
• The Markov Property enables the marketer to model paths without
•
knowing every state.
While some keyphrase data is known, it can also identify the keyphrase
based on other users’ paths where the keyphrase is known.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 8
9. Rivka Fogel
Application: Multichannel Attribution
Monitoring and prediction can be based on probability of
a user’s path given other users’ paths
Known Path 1
A
1
B
C
Probability of B
Known Path 2
2
B
4
Probability of C
C
D
5
• Identify A (or predict D) via multiple probability
states within a Markovian chain.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 9
10. Rivka Fogel
Application: Audience Segmentation
B
1
Probability of
B
A
2
Known Path 1
B
Known Path 2
Referral Paths
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
C
Landing
Page
4
Probability
of C
C
D
5
On-Site Paths
JANUARY 23, 2014 | PAGE 10
11. Rivka Fogel
Relational Markov Properties
Relational Markov Models allow states to be of different types.
State A
Type 1
State C
State B
Type 2
E because of B or D’s
type, not because of
A or C’s type
State D
• Relational Markov Models group multiple types of objects –
relations – and calculate the probability of the relation’s
appearance in a state.
• They work off of Dynamic Bayesian Networks
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 11
13. Rivka Fogel
Application: User Experience
Model Landing
Page
Homepage
Bounce
Inventory
Gallery Page
Video View
Homepage
Video View
Types:
Page Visit
Video View
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
Bounce
JANUARY 23, 2014 | PAGE 13
14. Rivka Fogel
Application: Social Network
Modeling
Rich Media
Brand Social
Profile
News Feed
Play
Site
Landing
Page
Rich Media
Host Page
User Share
Influencer
• This function will answer: if the user ended up
converting/visiting the landing page, which
[type(s)] of social interaction[s] came into play?
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 14
15. Rivka Fogel
Application: HTTP Service Request
Prediction
A
Keyphrase
1
Keyphrase
Cluster
Keyphrase
2
Probability of 3
1
3
Known
Paths
2
• Prefetch Page A given the probability that the user will want to see it.
• The keyphrase cluster is predicted by the function with co-domain B and
is then used to predict the incidence of B where the first state isn’t known.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 15
16. Rivka Fogel
Application: Agent Suggestion
Keyphrase
Cluster or
Authority
URL A
URL B
URL C
URL D
URL E
Search A
First words
of Query
Search B
Search C
• Auto-suggests searches (Search C) and links (URL E) that
the user is likely to want to access, based on user history
and other users’ history
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 16
17. Rivka Fogel
Application: Search Engine Scoring
Identifying Authority 2:
Keyphrase
Cluster
Authority 1
Page C
Page A
Page B
Authority 2
Link 1
Link 2
• The function identifies hubs of authority that are
probable next steps in many systems (each with
individual focus objects).
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 17
19. Rivka Fogel
Where, Probability Spaces:
• The measurable space (S, Σ) and an object on the
measurable space X
• The probability space is defined by the function P, the
assignment of probabilities to events, and where Ω is the
set of possible outcomes, and F is set of events in which
each event has 0 or more outcomes
P(x) = Σ(t1-tk)P(t1) for all X on Ω
• The finite dimensional distribution
X: Xt1 Ω -> Xk
• That arrow, or the push forward measures, or the random
distribution of events, or the matrix of transition probabilities
P P (.)=PT1(.)/x = Sk
– Where the Bayesian theorem allows for:
P (H|E old) = P(H)*P(H|E new)/P(E entire set)
T1
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 19
20. Rivka Fogel
Then, Markov Property:
• P(Xl+1=S | Xl=St | Xl-1 = St-1 … X0 = S0) = P(Xl+1=S | Xl
= Sl) | Xl=I
– The random distribution of events is defined because the
system is finite.
• So, in the matrix of transition probabilities [defined
as Pl, l+1 over ij = P(Xl+1 = j | Xl=i)], Pl is independent
of l.
• That is, s^(t) = s^(t-1)A
– s is the state space, A is the matrix of transition
probabilities, and ^ is the initial probability distribution of
the states in s. s(t) is the probability vector for states at
time “t.”
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 20
21. Rivka Fogel
Markov Restatement 1: When a
User’s History is Available
• A(s, s’)=C(s,s’)/Σs’’ C(s,s’’) and ^(s)=C(s)/Σs’ C(s’)
– C(s,s’) counts the instances where s’ follows s
– This can be applied to HTTP prediction and agent
suggestion
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 21
22. Rivka Fogel
Markov Restatement 2: When the
Evidence Comes from a User Pool
• The Markov function becomes a generative chain
link system that can store counts and probabilities
• s^(t) = a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3… and
= Max(a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3…)
– s(t) is normalized to select a list of probable states.
– Where probabilities are used:
This can be applied to authority hubs as well, where collected
user path traversal patterns are represented in a traversal
connectivity matrix.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 22
23. Rivka Fogel
Markov Restatement 3: When
Groupings of States Are Estimated
• These are Relational Markov Models
• These groupings are also seen as abstractions. A(Q) forms a
– {D, R, Q, A, π} where D ∈ D is the tree and a hierarchy of values. R is a
set of relations. Each relation is defined by nodes on leaves of D. Q is the
set of states. A is the transition probability matrix. Π is the initial
probability, that is the initial state in the chain. States are defined as
abstractions on Q.
– The rank of an abstraction a=R(d1, …., dk) in the lattice is defined as 1+
Σk1 depth(dk). Depth is a node’s depth on the tree, and increases with the
abstraction’s rank. The rank of Q (the most general) is 0.
lattice of abstractions.
• States that have nodes on common leaves will more frequently
appear in abstractions together.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 23
24. Rivka Fogel
Further Reading
• Anderson, Corin R., Domingos, Pedro, and Weld, Daniel S.
•
•
•
“Relational Markov Models and their Application to Adaptive Web
Navigation.” Proceedings of the eighth ACM SIGKDD international
conference on knowledge discovery and data mining. (2002): 143152. Electronic.
http://homes.cs.washington.edu/~pedrod/papers/kdd02a.pdf
Downey, Allen. “Bayesian statistics made (as) simple (as possible).”
Pycon US. 7 March 2012. http://pyvideo.org/video/608/bayesianstatistics-made-as-simple-as-possible
Ildiko, Flesch and Lucas, Peter. “Markov Equivalence in Bayesian
Networks.” Electronic. http://www.cs.ru.nl/P.Lucas/markoveq.pdf
Sarukkai, Ramesh R. “Link prediction and path analysis using Markov
chains.” Computer Networks 3 (June 2000): 377-386. Electronic.
http://www.sciencedirect.com/science/article/pii/S138912860000044X
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.
JANUARY 23, 2014 | PAGE 24
Stochastic definition: Stochastic processes are random processes that describe the evolution of a random value over time. As opposed to deterministic processes, which are just ordinary differential equations
In deterministic modeling (for the web), the user keyphrase is the focal point, and all subsequent stages are based on the focal pointIn stochastic modeling, the Markov theorem has all stages but the focal point and preceding stage irrelevant to the current stage. You can also define the preceding stage as the focal point This means that (not provided) is irrelevant when the focal point changes from the keyword to a SERP, landing page, or behavior (see relational Markov models/user behavior)Other users’ paths: See multichannel attribution
The Markov chain formula is generative, so modeling is easily automated.Monitoring and prediction is defined by the Bayesian theorem. E.g., The probability of the hypothesis given evidence from the initial source is dependent on the probability of the hypothesis given evidence from a different source
For example: the probability of a user picking a landing page and then picking an object on that landing page as opposed to the probability of picking both a different object, a different landing page, and a different path entirely can be calculated.Modeled spatially, not temporallyCan be combined with probabilities as well
Possible only via a spatial model because the nature of the co-domain means that you’d be modeling backwards