SlideShare a Scribd company logo
1 of 26
Download to read offline
Relating Web Characteristics
         Ricardo Baeza-Yates
            Carlos Castillo
         Universidad de Chile
Agenda
    Introduction
•
    Link-based ranking
•
    Web structure
•
    Web characteristics
•
    Web usage
•
    Web dynamics
•
    Conclusions
•

              Relating Web Characteristics
Introduction: Sample
    Web sample: .CL domain on year 2000
•
    670,000 pages in 7,500 domains
•
    15kb average page size
•
    Collection from the TodoCL web search
•
    engine




               Relating Web Characteristics
Introduction: Emphasis

• Broder et al.: Graph Structure on the
  Web (2000)
  – Page-based structure based on strongly
    connected components
  – The Web graph is not a random graph
  – Process: cut & paste model
• Our is mostly a site-based analysis
  – Trying to make Web structure meaningful
              Relating Web Characteristics
Introduction: The Empire




       Relating Web Characteristics
Introduction: One Map




      Relating Web Characteristics
Link ranking: Pagerank
                                  Pages that point
                                  to page p
                                              k
                q
Pagerank ( p ) = + (1 − q )∑ Pagerank (ri )
                N          i =1


                                                  Currently used by
                                                  Google
Probability of a
                                                  Brin & Page, 1998
random jump over
number of pages

                   Relating Web Characteristics
Link ranking: Hubs &
          Authorities
• HITS algorithm (Kleinberg, 1998)
• A good authority is a page pointed by
  good hubs, so we assume that it has
  good content
• A good hub is a page that points to
  good authorities, so we assume it is a
  good set of links
• Linear system calculated by numerical
  iteration
              Relating Web Characteristics
Link ranking: Distribution
                            <2% with relevant
                            Pagerank




9% with relevant
                                                  2-3% with relevant
hub score
                                                  authority score




                   Relating Web Characteristics
Link ranking: Correlation



                                         Hub score,
                                       authority score
                                       and Pagerank
                                        do not seem
                                      to be correlated



       Relating Web Characteristics
Link ranking: Sites

• Which measure to use for sites ?
• Average score
  – But good sites can have lots of bad pages
• Maximum score
  – But one good page cannot be all that is
    needed to be a good site
• Sum of the scores of all pages
  – Natural for Pagerank
               Relating Web Characteristics
Link ranking: Sites Graph

                   90% relevant site-Pagerank




It’s harder to have a
good hub than a
good authority (site)



                    Relating Web Characteristics
Web Structure: Basis
• The Web graph has structure:

                 MAIN


 IN
                                            OUT



  ISLANDS

             Relating Web Characteristics
Web Structure: Basis (cont.)
• The MAIN component has structure:




        MAIN IN
                                        MAIN OUT
                  MAIN MAIN


IN
             MAIN NORM                             OUT

              Relating Web Characteristics
Web Structure: Sketch




      Relating Web Characteristics
Web Structure: Degree




      Relating Web Characteristics
Web Structure: Sizes




     Relating Web Characteristics
Web Structure: Preferences




        Relating Web Characteristics
Web Structure: Preferences

                  OUT
                                          MAIN
                                          OUT
    OUT



                 MAIN                     MAIN
                 MAIN                     MAIN



    Real           ODP                TodoCL
           Relating Web Characteristics
Web Structure: Various




      Relating Web Characteristics
Web Structure: Link Scores




        Relating Web Characteristics
Web Dynamics: Ages
• The kernel of the Web comes from the
  past




             Relating Web Characteristics
Web Dynamics: By
  Component




    Relating Web Characteristics
Web Dynamics: Pagerank


            Pagerank is biased
            against newer pages




       Relating Web Characteristics
Web Dynamics: Hubs &
                       Authorities
Authority Score




                                        Hub Score


                              Age (months)

                        Relating Web Characteristics
Conclusions
• Pagerank/HITS do not seem to be
  correlated
  – And Pagerank is biased to older pages
• Site ranking can help to make good
  human-selected directories
• Finding good pages is not so simple
• Characterizing Web structure gives
  valuable insight
  – Web Graph Mining is just starting
               Relating Web Characteristics

More Related Content

Viewers also liked

Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyBioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyEloisa Vargiu
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data miningMai Mustafa
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataPier Luca Lanzi
 

Viewers also liked (8)

Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyBioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Web Mining
Web Mining Web Mining
Web Mining
 
Search Engine Demystified
Search Engine DemystifiedSearch Engine Demystified
Search Engine Demystified
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 

Similar to Relating Key Web Characteristics Such as Structure, Link Ranking and Dynamics

A4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureA4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureRoy Huiskes
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best PracticesKent Schnepp
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsJonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsiCrossing
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of LinkscapeNick Gerner
 
Technical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteTechnical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteAdam Audette
 
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Site-Seeker, Inc.
 
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...Vincenzo Barone
 
Gopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptGopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptSiddheshSawant54
 
Lifting The Lid On Search Marketing
Lifting The Lid On Search MarketingLifting The Lid On Search Marketing
Lifting The Lid On Search Marketingwater&stone
 
SEO Evatt INMA Dallas
SEO Evatt INMA DallasSEO Evatt INMA Dallas
SEO Evatt INMA DallasSteven Evatt
 
Windows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationWindows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationADRose
 
Seocertification TRAINING Courses
Seocertification TRAINING CoursesSeocertification TRAINING Courses
Seocertification TRAINING CoursesDr,Saini Anand
 

Similar to Relating Key Web Characteristics Such as Structure, Link Ranking and Dynamics (20)

A4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureA4Uexpo Internal Linking Structure
A4Uexpo Internal Linking Structure
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best Practices
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Site Analysis
Site AnalysisSite Analysis
Site Analysis
 
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsJonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
 
Stsinks.com seo Pitch ppt
Stsinks.com seo Pitch pptStsinks.com seo Pitch ppt
Stsinks.com seo Pitch ppt
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of Linkscape
 
Technical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteTechnical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam Audette
 
Imarks linkbuilding
Imarks linkbuildingImarks linkbuilding
Imarks linkbuilding
 
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
 
Google
GoogleGoogle
Google
 
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
 
Gopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptGopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch ppt
 
Lifting The Lid On Search Marketing
Lifting The Lid On Search MarketingLifting The Lid On Search Marketing
Lifting The Lid On Search Marketing
 
SEO Evatt INMA Dallas
SEO Evatt INMA DallasSEO Evatt INMA Dallas
SEO Evatt INMA Dallas
 
Seo Basic Training
Seo Basic TrainingSeo Basic Training
Seo Basic Training
 
Windows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationWindows Share Point Services V3 Presentation
Windows Share Point Services V3 Presentation
 
Seocertification TRAINING Courses
Seocertification TRAINING CoursesSeocertification TRAINING Courses
Seocertification TRAINING Courses
 
Pagerank
PagerankPagerank
Pagerank
 
Page ranking factors
Page ranking factorsPage ranking factors
Page ranking factors
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Relating Key Web Characteristics Such as Structure, Link Ranking and Dynamics

  • 1. Relating Web Characteristics Ricardo Baeza-Yates Carlos Castillo Universidad de Chile
  • 2. Agenda Introduction • Link-based ranking • Web structure • Web characteristics • Web usage • Web dynamics • Conclusions • Relating Web Characteristics
  • 3. Introduction: Sample Web sample: .CL domain on year 2000 • 670,000 pages in 7,500 domains • 15kb average page size • Collection from the TodoCL web search • engine Relating Web Characteristics
  • 4. Introduction: Emphasis • Broder et al.: Graph Structure on the Web (2000) – Page-based structure based on strongly connected components – The Web graph is not a random graph – Process: cut & paste model • Our is mostly a site-based analysis – Trying to make Web structure meaningful Relating Web Characteristics
  • 5. Introduction: The Empire Relating Web Characteristics
  • 6. Introduction: One Map Relating Web Characteristics
  • 7. Link ranking: Pagerank Pages that point to page p k q Pagerank ( p ) = + (1 − q )∑ Pagerank (ri ) N i =1 Currently used by Google Probability of a Brin & Page, 1998 random jump over number of pages Relating Web Characteristics
  • 8. Link ranking: Hubs & Authorities • HITS algorithm (Kleinberg, 1998) • A good authority is a page pointed by good hubs, so we assume that it has good content • A good hub is a page that points to good authorities, so we assume it is a good set of links • Linear system calculated by numerical iteration Relating Web Characteristics
  • 9. Link ranking: Distribution <2% with relevant Pagerank 9% with relevant 2-3% with relevant hub score authority score Relating Web Characteristics
  • 10. Link ranking: Correlation Hub score, authority score and Pagerank do not seem to be correlated Relating Web Characteristics
  • 11. Link ranking: Sites • Which measure to use for sites ? • Average score – But good sites can have lots of bad pages • Maximum score – But one good page cannot be all that is needed to be a good site • Sum of the scores of all pages – Natural for Pagerank Relating Web Characteristics
  • 12. Link ranking: Sites Graph 90% relevant site-Pagerank It’s harder to have a good hub than a good authority (site) Relating Web Characteristics
  • 13. Web Structure: Basis • The Web graph has structure: MAIN IN OUT ISLANDS Relating Web Characteristics
  • 14. Web Structure: Basis (cont.) • The MAIN component has structure: MAIN IN MAIN OUT MAIN MAIN IN MAIN NORM OUT Relating Web Characteristics
  • 15. Web Structure: Sketch Relating Web Characteristics
  • 16. Web Structure: Degree Relating Web Characteristics
  • 17. Web Structure: Sizes Relating Web Characteristics
  • 18. Web Structure: Preferences Relating Web Characteristics
  • 19. Web Structure: Preferences OUT MAIN OUT OUT MAIN MAIN MAIN MAIN Real ODP TodoCL Relating Web Characteristics
  • 20. Web Structure: Various Relating Web Characteristics
  • 21. Web Structure: Link Scores Relating Web Characteristics
  • 22. Web Dynamics: Ages • The kernel of the Web comes from the past Relating Web Characteristics
  • 23. Web Dynamics: By Component Relating Web Characteristics
  • 24. Web Dynamics: Pagerank Pagerank is biased against newer pages Relating Web Characteristics
  • 25. Web Dynamics: Hubs & Authorities Authority Score Hub Score Age (months) Relating Web Characteristics
  • 26. Conclusions • Pagerank/HITS do not seem to be correlated – And Pagerank is biased to older pages • Site ranking can help to make good human-selected directories • Finding good pages is not so simple • Characterizing Web structure gives valuable insight – Web Graph Mining is just starting Relating Web Characteristics