We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
4. og le
Go
Thanks Internet!
n ks
ha
http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html
T
http://listsoplenty.com/pix/tag/cartoon
https://www.facebook.com/ProgrammersJokes
http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
on-valentines-day.html
8. Vannevar Bush
“wholly new forms of
encyclopedias will appear,
ready made with a mesh of
associative trails running
through them, ready to be
dropped into the memex and
there amplified”
-- “As we may think” The Atlantic, July 1945
9. Sir Tim Berners-Lee
“We should work towards a
universal linked information
system … to allow a place for
any information or reference
one felt was important and a
way of finding it afterwards.”
-- Founding proposal for “the mesh”, 1989
10. … the mesh became the web
… the web became a mess
... “finding it afterwards”? Hah!
11. Larry Page "
Sergey Brin
• Grad students at Stanford
• Worked with Terry Winograd
(artificial intelligence)
• Created a web-search
algorithm called “backrub”
• Spun-off a company “Googol”
• Worth about $20 billion each
12. A cartoon websearch primer
1. Crawl webpages
2. Analyze webpage text (information retrieval)
3. Analyze webpage links
4. Fit measures to human evaluations
5. Produce rankings
6. Continuously update
16. What pages are
important?
The Google random surfer
• Follows a random link with
probability alpha"
“random clicks”
• Goes anywhere with
probability (1-alpha)"
“random jumps”
18. Andrei Markov
• Studied sequences of random
variables.
• The probability that the random
variable takes a particular value
only depends on it’s current value.
• The “page id” is the “random
variable” in the Markov chain!
19. Oskar Perron"
Georg Frobenius
• Simultaneously discovered
when a Markov chain has an
“average”
• The “average” of the web? It’s
the probability of finding the
random surfer at a page.
• In 1907
20. What pages are
important?
Perron and Frobenius proved the
following algorithm always
converges to a solution…
set prob[i] = 0 for all pages
set p to a random page
for t = 1 to ...
increment prob[p]
if rand() < alpha,
set p to a random neighbor of p
else, set p to a random page
21. Richard von Mises
• Created “the power method”
• An efficient algorithm to
“average” a Markov chain
• It updated the probabilities of
all pages at once.
“Praktische Verfahren der Gleichungsauflösung”"
R. von Mises and H. Pollaczek-Geiringer, 1929
22. What pages are
important?
Using the von Mises method …
set prob[i] = 1/n for all pages
for t = 1 to about 80
set newprob[i] = 0 for all pages
for all links from page i to page j
set newprob[j] += prob[i]/deg[i]
for all pages I
set prob[i] = alpha*newprob[i] +
(1-alpha)/n
26. A new status index (1953)"
Leo Katz
A paper about how information spreads in groups …
“For example, the information that the new high-
school principal is unmarried and handsome might
occasion a violent reaction in a ladies' garden club
and hardly a ripple of interest in a luncheon group of
the local chamber of commerce. On the other hand,
the luncheon group might be anything but apathetic
in its response to information concerning a fractional
change in credit buying restrictions announced by the
federal government.”
28. Gene Golub
Popularized numerical computing with
matrices via the informal “Golub thesis”
“anything worth computing can be
stated as a matrix problem”
William Kahan
Formalized IEEE-754 floating point arithmetic.
Make it possible to compute with probabilities
as “real numbers” instead of discrete counts.
29. Credits
Most pictures taken from Google image search.
Original idea from Massimo Franceschet.
“PageRank: Standing on the shoulders of giants”