1. Q3: How does Google
rank webpages?
Mung Chiang
Networks: Friends, Money, and Bytes
2. Webpages form a network
Links in text: since mid-20th century…
Hyperlinks in webpages
Early 1990s: web, browser, portal…
Mid to late 1990s: search…
Directed graph
Huge and sparse
N=40—60 Billion webpages out there…
And very few links in/out of most webpages
3. Which webpages are more
important?
Usefulness of ranking is hard to measure
So rank by importance
Quantify node importance:
Count the number of links?
More important links point to this page?
Turn a seemingly cyclic statmeent to characterize
an equilibrium of a recursive definition
4. General themes
Network consists of
Topology: graphs, matrices
Functionality: what you do on the graph
We’ll see 3 matrices and a model of the “search
and navigation” functionality
5. Try 1
Add up importance scores through incoming links
6. Try 2
Normalize by the spread of importance
Is there a set of consistent scores?
9. What does Google do?
Crawling the web
Storing and indexing the pages
Computing two scores to rank pages per search
Relevant scores
Importance scores
24. Parallel with DPC
Both are special cases of “power method” using
non-negative matrix theory
25. The challenge of scale
Numerical linear algebra methods
A few more tricks
26. SEO
How to increase your website rank?
How Google reacts?
Early 2011
May 2012
27. Summary
Hyperlinked webpages form a network
Connectivity pattern provides a hint on
importance
Pagerank uniquely defines and efficiently
computes a consistent set of importance scores
Which can be viewed as the dominant
eigenvector of the Google matrix