Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Using Markov Chains to Predict
User Behavior
Rivka Fogel

Rivka Fogel

Markov Chains: Probability without
History

Andrey
Markov

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED.

JANUARY 23, 2014 | PAGE 2

Rivka Fogel

What Are Probability Spaces?
Function/Possibility 1

Focal Object /
Function Co-Domain

• Also known as stochastic processes


Rivka Fogel

Type 1: Time Series

First Event


Also called
“states”

Time



Rivka Fogel

Application: Personalization
Identifying user-specific authorities

B

C

User

A

E

D

• To return more accurate SERPs (E) for that
user


Rivka Fogel

Type 2: Spatial Field

Shared Event

• Variable interactions are often
statistically correlated


Rivka Fogel

Addition of The Markov Property
The Next State Depends Only on the Current State:
A

B

C

E because of B or D,
not because of A

D

• The probability of B causing E, as opposed to D
causing E, is calculated by the Bayesian
Theorem


Rivka Fogel

Application: (not provided)
Model Landing
Page

Keyphrase?
Homepage

Inventory
Gallery Page
Video View

Bounce

Homepage
Video View

• The Markov Property enables the marketer to model paths without
•

knowing every state.
While some keyphrase data is known, it can also identify the keyphrase
based on other users’ paths where the keyphrase is known.



Rivka Fogel

Application: Multichannel Attribution
Monitoring and prediction can be based on probability of
a user’s path given other users’ paths
Known Path 1

A

1

B

C

Probability of B

Known Path 2

2

B

4

Probability of C

C

D

5

• Identify A (or predict D) via multiple probability
states within a Markovian chain.


Rivka Fogel

Application: Audience Segmentation
B

1

Probability of
B

A

2

Known Path 1

B

Known Path 2

Referral Paths


C
Landing
Page

4

Probability
of C
C

D

5

On-Site Paths


Rivka Fogel

Relational Markov Properties
Relational Markov Models allow states to be of different types.
State A

Type 1

State C

State B

Type 2

E because of B or D’s
type, not because of
A or C’s type

State D

• Relational Markov Models group multiple types of objects –
relations – and calculate the probability of the relation’s
appearance in a state.
• They work off of Dynamic Bayesian Networks


Rivka Fogel

Application: Audience
Segmentation 2
Paid

Known

1

C

B
Organic


2


Rivka Fogel

Application: User Experience
Model Landing
Page
Homepage

Bounce

Inventory
Gallery Page
Video View

Homepage
Video View

Types:
Page Visit

Video View


Bounce

Rivka Fogel

Application: Social Network
Modeling
Rich Media
Brand Social
Profile
News Feed

Play

Site
Landing
Page

Rich Media
Host Page
User Share

Influencer

• This function will answer: if the user ended up
converting/visiting the landing page, which
[type(s)] of social interaction[s] came into play?


Rivka Fogel

Application: HTTP Service Request
Prediction
A

Keyphrase
1
Keyphrase
Cluster
Keyphrase
2

Probability of 3

1
3

Known
Paths

2

• Prefetch Page A given the probability that the user will want to see it.
• The keyphrase cluster is predicted by the function with co-domain B and
is then used to predict the incidence of B where the first state isn’t known.


Rivka Fogel

Application: Agent Suggestion
Keyphrase
Cluster or
Authority

URL A

URL B

URL C

URL D

URL E

Search A
First words
of Query

Search B
Search C

• Auto-suggests searches (Search C) and links (URL E) that
the user is likely to want to access, based on user history
and other users’ history


Rivka Fogel

Application: Search Engine Scoring
Identifying Authority 2:
Keyphrase
Cluster
Authority 1
Page C

Page A
Page B

Authority 2

Link 1

Link 2

• The function identifies hubs of authority that are
probable next steps in many systems (each with
individual focus objects).


Appendix: Formal
Definitions



Rivka Fogel

Where, Probability Spaces:
• The measurable space (S, Σ) and an object on the

measurable space X
• The probability space is defined by the function P, the
assignment of probabilities to events, and where Ω is the
set of possible outcomes, and F is set of events in which
each event has 0 or more outcomes
P(x) = Σ(t1-tk)P(t1) for all X on Ω
• The finite dimensional distribution
X: Xt1 Ω -> Xk
• That arrow, or the push forward measures, or the random
distribution of events, or the matrix of transition probabilities
P P (.)=PT1(.)/x = Sk
– Where the Bayesian theorem allows for:
P (H|E old) = P(H)*P(H|E new)/P(E entire set)
T1



Rivka Fogel

Then, Markov Property:
• P(Xl+1=S | Xl=St | Xl-1 = St-1 … X0 = S0) = P(Xl+1=S | Xl
= Sl) | Xl=I
– The random distribution of events is defined because the
system is finite.

• So, in the matrix of transition probabilities [defined
as Pl, l+1 over ij = P(Xl+1 = j | Xl=i)], Pl is independent
of l.
• That is, s^(t) = s^(t-1)A
– s is the state space, A is the matrix of transition
probabilities, and ^ is the initial probability distribution of
the states in s. s(t) is the probability vector for states at
time “t.”



Rivka Fogel

Markov Restatement 1: When a
User’s History is Available
• A(s, s’)=C(s,s’)/Σs’’ C(s,s’’) and ^(s)=C(s)/Σs’ C(s’)
– C(s,s’) counts the instances where s’ follows s
– This can be applied to HTTP prediction and agent
suggestion



Rivka Fogel

Markov Restatement 2: When the
Evidence Comes from a User Pool
• The Markov function becomes a generative chain
link system that can store counts and probabilities
• s^(t) = a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3… and
= Max(a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3…)
– s(t) is normalized to select a list of probable states.
– Where probabilities are used:
This can be applied to authority hubs as well, where collected
user path traversal patterns are represented in a traversal
connectivity matrix.



Rivka Fogel

Markov Restatement 3: When
Groupings of States Are Estimated
• These are Relational Markov Models
• These groupings are also seen as abstractions. A(Q) forms a

– {D, R, Q, A, π} where D ∈ D is the tree and a hierarchy of values. R is a
set of relations. Each relation is defined by nodes on leaves of D. Q is the
set of states. A is the transition probability matrix. Π is the initial
probability, that is the initial state in the chain. States are defined as
abstractions on Q.
– The rank of an abstraction a=R(d1, …., dk) in the lattice is defined as 1+
Σk1 depth(dk). Depth is a node’s depth on the tree, and increases with the
abstraction’s rank. The rank of Q (the most general) is 0.

lattice of abstractions.

• States that have nodes on common leaves will more frequently
appear in abstractions together.



Rivka Fogel

Further Reading
• Anderson, Corin R., Domingos, Pedro, and Weld, Daniel S.

•
•
•

“Relational Markov Models and their Application to Adaptive Web
Navigation.” Proceedings of the eighth ACM SIGKDD international
conference on knowledge discovery and data mining. (2002): 143152. Electronic.
http://homes.cs.washington.edu/~pedrod/papers/kdd02a.pdf
Downey, Allen. “Bayesian statistics made (as) simple (as possible).”
Pycon US. 7 March 2012. http://pyvideo.org/video/608/bayesianstatistics-made-as-simple-as-possible
Ildiko, Flesch and Lucas, Peter. “Markov Equivalence in Bayesian
Networks.” Electronic. http://www.cs.ru.nl/P.Lucas/markoveq.pdf
Sarukkai, Ramesh R. “Link prediction and path analysis using Markov
chains.” Computer Networks 3 (June 2000): 377-386. Electronic.
http://www.sciencedirect.com/science/article/pii/S138912860000044X



Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (14)

Semelhante a Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Semelhante a Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More (13)

Último

Último (20)

Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Notas do Editor