PageRank is an algorithm used by Google to rank web pages. It models the web as a Markov chain and views a random web surfer clicking on links. Pages that are frequently visited or linked to by important pages receive higher PageRank scores. The algorithm represents the web as a link matrix which is column stochastic. This matrix's principal eigenvector corresponds to the PageRank scores, representing the theoretical steady state probabilities a random surfer would be on each page.
1. 1
Linear Algebra in Use: Ranking Web Pages with an
Eigenvector
Maia Bittner, Yifei Feng
Abstract—Google PageRank is an algorithm that uses the In this paper, we will explain how the interlinking structure
underlying, hyperlinked structure of the web to determine the of the web and the properties of Markov Chains can be used to
theoretical number of times a random web surfer would visit each quantify the relative importance of each indexed page. We will
page. Google converts these numbers into probabilities, and uses
these probabilities as each web page’s relative importance. Then, examine Markov Chains, eigenvectors, and the power iteration
the most important pages for a given query can be returned first method, as well as some of the problems that arise when using
in search results. In this paper, we focus on the math behind this system to rank web pages.
PageRank, which includes eigenvectors and Markov Chains, and
on explaining how it is used to rank webpages.
A. Markov Chains
Google PageRank, Eigenvector, Eigenvalue, Markov Chain
Markov chains are mathematical models that describe par-
I. I NTRODUCTION ticular types of systems. For example, we can construe the
The internet is humanity’s largest collection of information, number of students at Olin College who are sick as a Markov
and its democratic nature means that anyone can contribute Chain if we know the how likely it is that a student will
more information to it. Search engines help users sort through become sick. Let us say that if a student is feeling well, she has
the billions of available web pages to find the information a 5% chance of becoming sick the next day, and that if she is
that they are looking for. Most search engines use a two step already sick, she has a 35% chance of feeling better tomorrow.
process to return web pages based on the user’s query. The first In our example, a student can only be healthy or sick; these
step involves finding which of the pages the search engine has two states are called the state space of the system. In addition,
indexed are related to the query, either by containing the words we’ve decided to only ask how the students are feeling in the
in the query, or by more advanced means that use semantic morning, and their health on any day only depends on how
models. The second step is to order this list of relevant pages they were feeling the previous morning. This constant, discrete
according to some criterion. For example, the first web search increase in time makes the system time-homogenous. We can
engines, launched in the early 90’s, used text-based matching generate a set of linear equations that will describe how many
systems as their criterion for ordering returned results by students at Olin College are healthy and sick on any given day.
relevancy. This ranking method often resulted in returning If we let mk indicates the number of healthy students and nk
exact matches in unauthoritative, poorly written pages before indicates the number of sick students at morning k, then we
results that the user could trust. Even worse, this system was get the following two equations:
easy to exploit by page owners, who could fill their pages mk+1 = .95mk + .35nk
with irrelevant words and phrases, with the hope of ranking
nk+1 = .05mk + .65nk
highly. These problems prompted researchers to investigate
more advanced methods of ranking.
Larry Page and Sergey Brin were researching a new kind Putting this system of linear equations into matrix notation,
of search engine at Stanford University when they had the we get:
idea that pages could be ranked by link popularity. The
underlying social basis of their idea was that more reputable mk+1 .95 .35 mk
= (1)
pages are linked to by other pages more often. Page and Brin nk+1 .05 .65 nk
developed an algorithm that could quantify this idea, and in We can take this matrix full of probabilities and call it P, the
1998, christened the algorithm PageRank [2] and published transition matrix.
their first paper on it. Shortly afterwards, they founded Google,
a web search engine that uses PageRank to help rank the
.95 .35
returned results. Google’s famously useful search results [3] P= (2)
.05 .65
helped it reach almost $29 billion dollars in revenue in 2010
[1]. The original algorithm organized the indexed web pages The columns can be viewed as representing the present state
such that the links between them are used to construct the of the system, and the rows can be viewed as representing
probability of a random web surfer navigating from one page the future state of the system. The first column accounts for
to another. This system can be characterized as a Markov the students who are healthy today, and the second column
Chain, a mathematical model described below, in order to take accounts for the students who are sick today, while the first row
advantage of their convenient properties. indicates the students who will be healthy tomorrow and the
2. 2
second row indicates the students who will be sick tomorrow. classified as a Markov Chain has these steady-state values that
The intersecting elements of the transition matrix represent the system will remain constant at, regardless of the initial
the probability that a student will transition from that column state. This steady-state vector is a specific example of an
state to the row state. So we can see that p1,1 indicates that eigenvector, explained below.
95% of the students who are healthy today will be healthy
tomorrow. The total number of students who will be healthy B. Eigenvalues and Eigenvectors
tomorrow is represented by the first row, so that it is 95% of
An eigenvector is a nonzero vector x that, when multipled
the students who are healthy today plus 35% of the students
by a matrix A, only scales in length and does not change
who are sick today. Similarly, the second row shows us the
direction, except for potentially a reverse in direction.
number of students who will be sick tomorrow: 5% of the
students who are healthy today plus 65% of the students who Ax = λx (9)
are sick today.
The corresponding amount that an eigenvector is scaled by, λ,
You can see that each column sums to one, to account for
is called its eigenvalue. There are several techniques to find the
100% of students who are sick and 100% of students who
eigenvalues and eigenvectors of a matrix. We will demonstrate
are healthy in the current state. Square matrices like this, that
one technique below with matrix A.
have nonnegative real entries and where every column sums
to one, are called column-stochastic. 1 2 5
We can find the total number of students who will be in A= 0 3 0
a state on day k + 1 by multiplying the transition matrix by 0 0 4
a vector containing how many students were in each state on How to find the eigenvalues for matrix A: We know that λ is
day k. a eigenvalue of A if and only if the equation
Pxk = xk+1 (3)
Ax = λx (10)
For example, if 270 students at Olin College are healthy today,
and 30 are sick, we can find the state vector for tomorrow: has a nontrivial solution. This is equivalent to finding λ such
that:
.95 .35 270 267
= (4) (A − λI)x = 0 (11)
.05 .65 30 33
The above equation has nontrivial solution when the determi-
which shows that tomorrow, 267 students will be healthy and
nant of A − λI is zero.
33 students will be sick. To find the next day’s values, you
can multiply again by the transition matrix: 1−λ 2 5
det(A − λI3 ) = 0 3−λ 0
.95 .35 267 265.2 0 0 4−λ
= (5)
.05 .65 33 34.8
= (4 − λ)(1 − λ)(3 − λ) = 0
which is the same as
2
Solving for λ, we get that the eigenvalues are λ1 = 1, λ2 = 3,
.95 .35 270 265.2 and λ3 = 4. If we solve for Avi = λi vi , we will get the
= (6)
.05 .65 30 34.8 corresponding eigenvector for each eigenvalue:
So we can see that in order to find mk and nk , we can 1 4 5
multiply P k by the state vector containing m0 and n0 , as in v1 = 0 , v2 = −1 , v3 = 0
the equation below: 0 2 3
k
mk .95 .35 m0 This means that if v2 is transformed by A, the result will
= (7) scale v2 by its eigenvalue, 3.
nk .05 .65 n0
You can see in Eqn. (8) that the steady state of a Markov
If you continue to multiply the state vector by the transition Chain has an eigenvalue of 1. This is why those steady state
matrix for very high values of k, we will see that it will vectors are a special case of eigenvectors. Because they are
eventually converge upon a steady state, regardless of initial column-stochastic, all transition matrices of Markov Chains
conditions. This is represented by vector q in Eqn. (8). will have an eigenvalue of 1 (we invite the reader to prove
this in Exercise 4). A system having an eigenvalue of 1 is the
Pq = q (8) same as it having a steady state.
Being able to find this steady-state vector is the main In some matrices, we may get repeated roots when solving
advantage of using a column-stochastic matrix to model a det(A − λI) = 0. We will demonstrate this for the column-
system. Column-stochastic transition matrices are always used stochastic matrix P:
to represent the known probabilities of transitioning between 0 1 0 0 0
states in Markov Chains. To model a system as a Markov 1 0 0 0 0
1
Chain, it must be a discrete-time process that has a finite P= 0 0 0 1 2
state space, and the probability distribution for any state must 0 0 0 0 1
2
depend only on the previous state. Every situation that can be 0 0 1 0 0
3. 3
four webpages. Page 1 is linked to by pages 2 and 3, so its
importance score is x1 = 2. In the same way, we can get
x2 = 3, x3 = 1, x4 = 2. Page 2 has the highest importance
score, indicating that page 2 is the most important page in this
web.
However, this approach has several drawbacks. First, the
pages that have more links to other pages would have more
votes, which means that a website can easily gain more
influence by creating many links to other pages. Second, we
would expect that a vote from an important page should weigh
more than a vote from an unimportant one, but every page’s
vote is worth the same amount with this method. A way to
fix both of these problems is to give each page the amount
Figure 1. A small web of 4 pages, connected by directional links
of voting power that is equivalent to its importance score. So
for webpage k with an importance score of xk , it has a total
To find the eigenvalues, solve: voting power of xk . Then we can equally distribute xk to all
the pages it links to. We can define the importance score of
−λ 1 0 0 0 a page as the sum of all the weighted votes it gets from the
1 −λ 0 0 0 pages that link to it. So if webpage k has a set of pages Sk
det(P − λI5 ) = 0 0 −λ 1 1 2 linked to it, we have
0 0 −λ 0 1 2 xj
0 0 1 0 −λ xk = (12)
nj
1 j∈Sk
= − (λ − 1)2 (λ + 1)(2λ2 + 2λ + 1) = 0
2 where nj is the number of links from page j. If we apply this
When we solve the characteristic equation, we find that the method to the network of Figure 1, we can get a system of
five eigenvalues are: λ1 = 1, λ2 = 1, λ3 = −1, λ4 = − 1 − 2 , i linear equations:
2
1 i x2 x4
λ5 = − 2 + 2 . Since 1 appears twice as an eigenvalue, we
x1 = +
say that is has algebraic multiplicity of 2. The number of 1 2
individual eigenvectors associated with eigenvalue 1 is called x1 x3 x4
x2 = + +
the geometric multiplicity of λ = 1. The reader can confirm 3 2 2
x1
that in this case, λ = 1 has geometric multiplicity of 2 with x3 =
3
associated eigenvectors x and y. x1 x3
x4 = +
√2 √2 3 2
√2 − 2
√
2 2
2 2
x= 0 ,y = 0
which can be written in the matrix form x = Lx, where x =
0 0 [x1 , x2 , x3 , x4 ]T and
0 0
0 1 0 2 1
We can see that when transition matrices for Markov chains 1 0 1 1
L= 1 3 2 2
have geometric multiplicity for eigenvalue of 1, it’s unclear
3 0 0 0
1 1
which independent eigenvector should be used to represent 3 0 2 0
the steady-state of the system. L is called the link matrix of this network system since it
encapsulates the links between all the pages in the system.
II. PAGE R ANK Because we’ve evenly distributed xk to each of the pages
When the founders of Google created PageRank, they were k links to, the link matrix is always column-stochastic. As
trying to discern the relative authority of web pages from the we defined earlier, vector x contains the importance scores
underlying structure of links that connects the web. They did of all web pages. To find these scores, we can solve for
this by calculating an importance score for each web page. Lx = x. You’ll notice that this looks similar to Eqn. (8),
Given a webpage, call it page k, we can use xk to denote the and indeed, we can transform this problem into finding the
importance of this page among the total number of n pages. eigenvector with eigenvalue λ = 1 for the matrix L! For
There are many different ways that one could calculate an matrix L, the eigenvector is [0.387, 0.290, 0.129, 0.194]T for
importance score. One simple and intuitive way to do page λ = 1. So we know the importance score of each page is
ranking is to count the number of links from other pages x1 ≈ 0.387, x2 ≈ 0.290, x3 ≈ 0.129, x4 ≈ 0.194. Note that
to page k, and assign that number as xk . We can think of with this more sophisticated method, page 1 has the highest
each link as being one vote for the page it links to, and of importance score instead of page 2. This is because page
the number of votes a page gets as showing the importance 2 only links to page 1, so it casts its entire vote to page
of the page. In the example network of Figure 1, there are 1, boosting up its score. Knowing that the link matrix is a
4. 4
column-stochastic matrix, let us now look at the problem of
ranking internet pages in terms of a Markov Chain system.
For a network with n pages, the ith entry of the n × 1
vector xk denotes the probability of visiting page i after k
clicks. The link matrix L is the transition matrix such that
the entry lij is the probability of clicking on a link to page
i when on page j. Finding the importance score of a page is
the same as finding its entry in the steady state vector of the
Markov chain that describes the system. For example, say we
start from page 1, so that vector that represents our begining
state is x0 = [1, 0, 0, 0]T To find the probability of ending up
on each web page after n clicks, we do:
xn = Ln x0 (13) Figure 2. Here are two small subwebs, which do not exchange links
where n represents the state after n clicks (the possibility of
being on each page), and L is the link matrix, or transition
matrix. So by calculating the powers of L, we can determine 0 1 0 0 0
the steady state vector. This process is called the Power
1 0 0 0 0
1
Iteration method, and it converges on an estimate of the A=
0 0 0 1 2
1
eigenvector for the greatest eigenvalue, which is always 1 in 0 0 0 0 2
the case of a column-stochastic matrix. For example, by raising 0 0 1 0 0
the link matrix L to the 25th power, we have Mathematically, this problem poses itself as being a multi-
0.387 0.387 0.387 0.387
dimensional Markov Chain. Link matrix A has an geometric
0.290 0.290 0.290 0.290 multiplicity of 2 for the eigenvalue of 1, as we showed
B≈ 0.129 0.129 0.129 0.129
in section I. B. It’s unclear which of the two associated
0.194 0.194 0.194 0.194 eigenvectors should be chosen to form the rankings. The
two eigenvectors are essentially eigenvectors for each subweb,
If we multiply matrix B by our initial state x0 , we get our and each shows rankings which are accurate locally, but not
steady state vector globally.
T Google has chosen to solve this problem of subwebs by
s= 0.387 0.290 0.129 0.194 introducing an element of randomness into the link matrix.
1
which shows the probability of each link being visited. This Defining a matrix S as an n × n matrix with all entries n
power iteration process gives us approximately the same and a value m between 0 and 1 as a relative weight, we can
result as finding the eigenvector of the link matrix, but is replace link matrix A with:
often more computationally feasible, especially for matrices
with a dimension of around 1 billion, like Google’s. These M = (1 − m)A + mS (14)
computation savings are why this is the method by which If m > 0, there will be no parts of matrix M which
Google actually calculates the PageRank of web pages [5]. By represent entirely disconnected subwebs, as every web surfer
taking powers of the matrix to estimate eigenvectors, Google has some probability of reaching another page regardless of
is doing the reverse of many applications, which diagonalize the page they’re on. In the original PageRank algorithm, an
matrices into their eigenvector components in order to take m value of .15 was used. Today, it is spectulated by those
them to a high power. Few applications actually use the power outside of Google that the value currently in use lies between
iteration method, since it is only appropriate given a narrow .1 and .2. The larger the value of m, the higher the random
range of conditions. The sparseness of the web’s link matrix, matrix is weighted, and the more egalitarian the corresponding
and the need to only know the eigenvector corresponding to PageRank values are. If m is 1, a web surfer has equal
the dominant eigenvalue, make it an application well-suited to probability of getting to any page on the web from any other
take advantages of the power iteration method. page, and the all links would be ignored. If m is 0, any
subwebs contained in the system will cause the eigenvalues to
A. Subwebs have a multiplicity greater than 1, and there will be ambiguity
in the system.
We now address a problem that this model has when
faced with real-world constraints. We refer to this problem
as disconnected subwebs, as shown in Figure 2. If there are III. D ISCUSSION
two or more groups of linked pages that do not link to each We’ve shown how systems that can be characterized as
other, it is impossible to rank all pages relative to each other. a Markov Chain will converge to a steady state, and that
The matrix shown below is the link matrix for the web shown these steady state values can be found by using either the
in Figure 2: characteristic equation or the power iteration method. We then
5. 5
investigated how the web can be viewed as a Markov Chain
when the state is which page a web surfer is on, and the hy-
perlinks between pages dictate the probability of transitioning
from one state to another. With this characterization, we can
view the steady-state vector as the proportional amount of time
a web surfer would spend on every page, hence, a valuable
metric for which pages are more important.
R EFERENCES
[1] O’Neill, Nick. Why Facebook Could Be Worth $200 Billion All Facebook.
Available at http://www.allfacebook.com/is-facebooks-valuation-hype-a-
macro-perspective-2011-01
[2] Page, L. and Brin, S. and Motwani, R. and Winograd, T. The pagerank
citation ranking: Bringing order to the web Technical report, Stanford Figure 3. A small web of 5 pages
Digital Library Technologies Project, 1998.
[3] Granka, Laura A. and Joachims, Thorsten and Gay, Geri. Eye-tracking
analysis of user behavior in WWW search SIGIR, 2004. Available at
http://www.cs.cornell.edu/People/tj/publications/granketal04a.pdf IV. E XERCISES
[4] Grinstead, Charles M. and Snell, J. Laurie Introduction Exercise 1: Find the eigenvalues and correspoding eigenvectors of the
to Probability: Chapter 11 Markov Chains Available at following matrix.
http://www.dartmouth.edu/ chance/teachingaids/booksarticles/probabilitybook/Chapter11.pdf
3 0 −1 0
[5] Ipsen, Ilse, and Rebecca M. Wills Analysis and Computation of Google’s
PageRank 7th IMACS International Symposium on Iterative Methods in 4 −3 2 0
A=
Scientific Computing, Fields Institute, Toronto, Canada, 5 May 2005. 0 0 1 0
Available at http://www4.ncsu.edu/ ipsen/ps/slidesimacs.pdf −2 0 3 2
Exercise 2: Given the column-stochastic matrix P:
0.6 0.3
P=
0.4 0.7
find the steady-state vector for P.
Exercise 3: Create a link matrix for the network with 5 internet pages
in Figure 3, then rank the pages.
Exercise 4: In section I B. we claim that all transition matrices for
Markov chains have 1 as an eigenvalue. Why is this true for every
column-stochastic matrix?
6. 6
V. S OLUTION TO E XERCISES
Solution 1: The eigenvalues are λ1 = 2, λ2 = −3, λ3 = 3 and λ4 = 1,
and corresponding eigenvectors are
0 0 3 1
0 1 2 2
x1 = x = x = x =
0 2 0 3 0 4 2
1 0 −6 −4
Solution 2: The steady-state vector for P is,
0.4286
s=
5714
Solution 3: The link matrix is,
1
0 1 0 0
3
1 0 1
0 0
3 3
1
A= 0 0 0 2
1
1 0 1
0 0
3 3
1 1
3
0 0 2
0
The ranking vector is,
1 1 1 1 1 T
x= 4 6 4 6 6
Solution 4: By definition, every column in a column stochastic matrix
contains no negative numbers and sums to 1. As shown in Section I. B.,
eigenvalues are the numbers that, when subtracted from the main diagonal
of a matrix, cause its determinant to equal 0. Since every column adds to
1, and we are subtracting 1 from every column, clearly the determinant
will equal 0. Therefore, 1 is always an eigenvalue of column-stochastic
matrices.