SlideShare a Scribd company logo
1 of 4
Download to read offline
Pagerank on Mapreduce (With Example)
Song Liu, sl9885@bristol.ac.uk


Pagerank is a Good Thing.

It can indicate the internal authority of each node in a network by investigating the relationships
between them.

The original formula is can be seen here
http://en.wikipedia.org/wiki/PageRank

For simplicity, it can be written in this way: Rv=beta * Sigma(Rn/Nn)+ (1-beta)/N

Where Rv is the Pagerank of PageV and Rn represents the Pagerank of PageN which links to
PageV, and Nn is the number of total links of PageN. By summing up (Rn/Nn) for all inlinks of
PageV, we can get the initial Rank of PageV. To normalize this rank, we use two parameters. Beta
is a parameter which is called "damping coefficient", and N is the total number of webpages.

Again, for simplicity, we don’t consider the normalization: the "damping coefficient" beta and
total number N.


Let's Parallelize It

We assume the input follows the logical pattern below:

PageA -> Page1, Page2, Page3...
PageB -> page1', Page2', Page3'...

Before the first iteration, we need to assign the initial rank for each webpage. To my experience,
0.5 is a good number for small website or network. In real calculation, this assignment can be
done by a single-pass mapreduce procedure, after which, the input becomes to the following
structure:

<PageA, 0.5> -> Page1, Page2, Page3...
<PageB, 0.5> -> page1', Page2', Page3'...

This data structure indicates the outlinks for PageA, PageB... But in Pagerank, we concern more
about the inlinks. So the first thing we need to do in Mapper function is to invert this data structure.
Second, the factor Rn/Nn can be also easily calculated at this stage for each webpage.
>>>>>Pseudo Code Starts<<<<<
function Mapper
input <PageN, RankN> -> PageA, PageB, PageC...
begin
     Nn := the number of outlinks for PageN;
     for each outlink PageK
           output PageK-> <PageN, RankN/Nn>
     // output the outlinks for PageN.
     output PageN-> PageA, PageB, PageC…
end
>>>>>Pseudo Code Ends<<<<<

So after Mapper function, we have the following data structure:
PageK-> <PageN1, RankN1/Nn1>
           <PageN2, RankN2/Nn2>
           ...
           <PageAk, PageBk, PageCk…>

where PageN1, PageN2 is the inlinks of PageK, and PageAk, PageBk… are the outlinks. These
outlinks are saved because we need them to rebuild the input format for the next iteration.

The job for Reducer function is to update the pagerank for PageK, using the ranks of inlinks.

>>>>>Pseudo Code Starts<<<<<
function Reducer
input PageK-> <PageN1, RankN1/Nn1>
                <PageN2, RankN2/Nn2>
                ...
                <PageAk, PageBk, PageCk…> //outlinks of PageK
begin
     RankK := 0;
     for each inlink PageNi
           RankK += RankNi/Nni * beta
     // output the PageK and its new Rank for the next iteration
     output <PageK, RankK> -> <PageAk, PageBk, PageCk…>
end
>>>>>Pseudo Code Ends<<<<<

It's almost done! All you need is to run this Map-Reduce procedure iteratively till convergence. To
my knowledge, 15 iterations are enough in small cases, but larger case may take hundreds of
iterations.
Let’s Taste It

Here is some top pageranks of today (23/Feb)'s New York Times website (I only crawled world
news). There are around 14000 pages in total.
The statistics is interesting, and surely, you will find the same distribution pattern if you try to
rank more website and plot more.

http://www.nytimes.com/    1307.420286
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html 834.3881531
http://www.nytimes.com/2010/02/22/world/americas/22gays.html?bl 372.152869
http://www.nytimes.com/2010/02/24/world/asia/24afghan.html?hp      119.3507215
http://www.nytimes.com/2010/02/23/world/asia/23islamabad.html      81.86431592
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=all 70.64134337
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=2 60.43903422
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?hpw 59.92536076
http://www.nytimes.com/2010/02/23/world/asia/23afghan.html?hpw 59.92536076
http://www.nytimes.com/2010/02/23/world/asia/23islamabad.html?hp        59.92536076
http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html?hpw         59.92536076
http://www.nytimes.com/2010/02/22/world/americas/22gays.html       47.20198605
http://www.nytimes.com/2010/02/22/world/americas/22gays.html?bl=&pagewanted=all          41.90279295
http://www.nytimes.com/2010/02/24/world/asia/24afghan.html 10.56957042
http://www.nytimes.com/2010/02/23/world/asia/23afghan.html 10.1774012
http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html       10.02345548
http://www.nytimes.com/2010/02/24/world/asia/24afghan.html?hp=&pagewanted=all 9.718268616
http://www.nytimes.com/2010/02/17/world/asia/17afghan.html?fta=y 7.693364285
http://www.nytimes.com/2010/02/20/world/asia/20drones.html?fta=y 7.615689791
http://www.nytimes.com/2002/07/07/world/afghan-official-is-assassinated-blow-to-karzai.html   6.967736047
http://www.nytimes.com/2010/01/08/world/asia/08afghan.html?fta=y 6.29221417
http://www.nytimes.com/2010/01/07/world/asia/07afghan.html?fta=y 5.909407591
http://www.nytimes.com/2010/02/22/world/americas/22gays.html?pagewanted=all         5.799193096
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?hpw=&pagewanted=all           5.657414027
http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html?hpw=&pagewanted=all         5.534888339
http://www.nytimes.com/2010/02/23/world/asia/23taliban.html?ref=asia    5.3387006
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=2&hpw 4.840375361
http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=1 4.816480315
http://www.nytimes.com/2010/02/16/world/asia/16intel.html    4.671784654
http://www.nytimes.com/2010/02/15/world/asia/15afghan.html?fta=y 4.292737953
Here is the rank distribution (only top 30):




Tools

The ranks I plotted above were calculated by a small search engine which contains the Pagerank
calculation on Mapreduce. Fortunately, it seems working fine with the Bluecrystal cluster.
Further detailed usage can be found in its website: http://joycrawler.googlecode.com
Or feel free to email me: sl9885@bristol.ac.uk

More Related Content

Similar to Pagerank is a good thing

Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithmalexandrelevada
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)James Arnold
 
Done reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteDone reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteJames Arnold
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)James Arnold
 
Local Approximation of PageRank
Local Approximation of PageRankLocal Approximation of PageRank
Local Approximation of PageRanksjuyal
 
Eric Fan Insight Project Demo
Eric Fan Insight Project DemoEric Fan Insight Project Demo
Eric Fan Insight Project DemoEric Fan
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibEl Habib NFAOUI
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
A hadoop implementation of pagerank
A hadoop implementation of pagerankA hadoop implementation of pagerank
A hadoop implementation of pagerankChengeng Ma
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data miningMai Mustafa
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLDataWorks Summit
 

Similar to Pagerank is a good thing (20)

Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithm
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
 
Done reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteDone reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcomplete
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
 
Local Approximation of PageRank
Local Approximation of PageRankLocal Approximation of PageRank
Local Approximation of PageRank
 
Eric Fan Insight Project Demo
Eric Fan Insight Project DemoEric Fan Insight Project Demo
Eric Fan Insight Project Demo
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
J046045558
J046045558J046045558
J046045558
 
A hadoop implementation of pagerank
A hadoop implementation of pagerankA hadoop implementation of pagerank
A hadoop implementation of pagerank
 
T180304125129
T180304125129T180304125129
T180304125129
 
Pagerank
PagerankPagerank
Pagerank
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
~Ns2~
~Ns2~~Ns2~
~Ns2~
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 

Recently uploaded

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Pagerank is a good thing

  • 1. Pagerank on Mapreduce (With Example) Song Liu, sl9885@bristol.ac.uk Pagerank is a Good Thing. It can indicate the internal authority of each node in a network by investigating the relationships between them. The original formula is can be seen here http://en.wikipedia.org/wiki/PageRank For simplicity, it can be written in this way: Rv=beta * Sigma(Rn/Nn)+ (1-beta)/N Where Rv is the Pagerank of PageV and Rn represents the Pagerank of PageN which links to PageV, and Nn is the number of total links of PageN. By summing up (Rn/Nn) for all inlinks of PageV, we can get the initial Rank of PageV. To normalize this rank, we use two parameters. Beta is a parameter which is called "damping coefficient", and N is the total number of webpages. Again, for simplicity, we don’t consider the normalization: the "damping coefficient" beta and total number N. Let's Parallelize It We assume the input follows the logical pattern below: PageA -> Page1, Page2, Page3... PageB -> page1', Page2', Page3'... Before the first iteration, we need to assign the initial rank for each webpage. To my experience, 0.5 is a good number for small website or network. In real calculation, this assignment can be done by a single-pass mapreduce procedure, after which, the input becomes to the following structure: <PageA, 0.5> -> Page1, Page2, Page3... <PageB, 0.5> -> page1', Page2', Page3'... This data structure indicates the outlinks for PageA, PageB... But in Pagerank, we concern more about the inlinks. So the first thing we need to do in Mapper function is to invert this data structure. Second, the factor Rn/Nn can be also easily calculated at this stage for each webpage.
  • 2. >>>>>Pseudo Code Starts<<<<< function Mapper input <PageN, RankN> -> PageA, PageB, PageC... begin Nn := the number of outlinks for PageN; for each outlink PageK output PageK-> <PageN, RankN/Nn> // output the outlinks for PageN. output PageN-> PageA, PageB, PageC… end >>>>>Pseudo Code Ends<<<<< So after Mapper function, we have the following data structure: PageK-> <PageN1, RankN1/Nn1> <PageN2, RankN2/Nn2> ... <PageAk, PageBk, PageCk…> where PageN1, PageN2 is the inlinks of PageK, and PageAk, PageBk… are the outlinks. These outlinks are saved because we need them to rebuild the input format for the next iteration. The job for Reducer function is to update the pagerank for PageK, using the ranks of inlinks. >>>>>Pseudo Code Starts<<<<< function Reducer input PageK-> <PageN1, RankN1/Nn1> <PageN2, RankN2/Nn2> ... <PageAk, PageBk, PageCk…> //outlinks of PageK begin RankK := 0; for each inlink PageNi RankK += RankNi/Nni * beta // output the PageK and its new Rank for the next iteration output <PageK, RankK> -> <PageAk, PageBk, PageCk…> end >>>>>Pseudo Code Ends<<<<< It's almost done! All you need is to run this Map-Reduce procedure iteratively till convergence. To my knowledge, 15 iterations are enough in small cases, but larger case may take hundreds of iterations.
  • 3. Let’s Taste It Here is some top pageranks of today (23/Feb)'s New York Times website (I only crawled world news). There are around 14000 pages in total. The statistics is interesting, and surely, you will find the same distribution pattern if you try to rank more website and plot more. http://www.nytimes.com/ 1307.420286 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html 834.3881531 http://www.nytimes.com/2010/02/22/world/americas/22gays.html?bl 372.152869 http://www.nytimes.com/2010/02/24/world/asia/24afghan.html?hp 119.3507215 http://www.nytimes.com/2010/02/23/world/asia/23islamabad.html 81.86431592 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=all 70.64134337 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=2 60.43903422 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?hpw 59.92536076 http://www.nytimes.com/2010/02/23/world/asia/23afghan.html?hpw 59.92536076 http://www.nytimes.com/2010/02/23/world/asia/23islamabad.html?hp 59.92536076 http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html?hpw 59.92536076 http://www.nytimes.com/2010/02/22/world/americas/22gays.html 47.20198605 http://www.nytimes.com/2010/02/22/world/americas/22gays.html?bl=&pagewanted=all 41.90279295 http://www.nytimes.com/2010/02/24/world/asia/24afghan.html 10.56957042 http://www.nytimes.com/2010/02/23/world/asia/23afghan.html 10.1774012 http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html 10.02345548 http://www.nytimes.com/2010/02/24/world/asia/24afghan.html?hp=&pagewanted=all 9.718268616 http://www.nytimes.com/2010/02/17/world/asia/17afghan.html?fta=y 7.693364285 http://www.nytimes.com/2010/02/20/world/asia/20drones.html?fta=y 7.615689791 http://www.nytimes.com/2002/07/07/world/afghan-official-is-assassinated-blow-to-karzai.html 6.967736047 http://www.nytimes.com/2010/01/08/world/asia/08afghan.html?fta=y 6.29221417 http://www.nytimes.com/2010/01/07/world/asia/07afghan.html?fta=y 5.909407591 http://www.nytimes.com/2010/02/22/world/americas/22gays.html?pagewanted=all 5.799193096 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?hpw=&pagewanted=all 5.657414027 http://www.nytimes.com/2010/02/23/world/middleeast/23mideast.html?hpw=&pagewanted=all 5.534888339 http://www.nytimes.com/2010/02/23/world/asia/23taliban.html?ref=asia 5.3387006 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=2&hpw 4.840375361 http://www.nytimes.com/2010/02/23/world/americas/23amputee.html?pagewanted=1 4.816480315 http://www.nytimes.com/2010/02/16/world/asia/16intel.html 4.671784654 http://www.nytimes.com/2010/02/15/world/asia/15afghan.html?fta=y 4.292737953
  • 4. Here is the rank distribution (only top 30): Tools The ranks I plotted above were calculated by a small search engine which contains the Pagerank calculation on Mapreduce. Fortunately, it seems working fine with the Bluecrystal cluster. Further detailed usage can be found in its website: http://joycrawler.googlecode.com Or feel free to email me: sl9885@bristol.ac.uk