SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1
“when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
Local Ranking Problem
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”

- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and
reverse PageRank”

- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2
The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages 

and edges are browsing transitions”
user navigation

(e.g. Flickr)
construction
Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years

- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic
discovery and photostream recommendation”

- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing
behavior”

- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings 

compared to standard hyperlinks graphs

- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page
importance.”
4
Local Ranking Problem
on the BrowseGraph
WHY?
5
Local Ranking Problem
on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012

We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could
vary having more
information (i.e. nodes)?
6
BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based 

on the referrer domain
Recommend news articles
following the ReferrerGraphs
BrowseGraph
Twitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on 

centrality-based algorithms
to infer news importance?
Local Ranking Problem
on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and
global PageRank exploiting the structural properties of the
local graph
Discover the referrer domain when it is not available 

(not discussed in the presentation—please see the paper)
8
Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)
2. Construct the ReferrerGraphs (our “local graphs”)
9
Very different dimensions
Subgraph Comparison
Very well connected 

(also Reddit—the smallest one)
10
Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)

~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11
1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighborhood of
nodes
4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph 

by incrementally expand the local graph
K(local+0, global) ~0.307
K(local+1, global) ~0.524
K(local+2, global) ~0.740
K(local+3, global) ~0.912
12
Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experiment
13
Same size referrer-based (SRB) to measure the
impact of the graph size
Random (R) : 7 random graphs reflecting the
size of the original RB graphs
Growing Rings Experiment
14
ReferrerGraphs
Growing Rings Experiment
15
same size RGs RandomReferrerGraphs
Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes
bring less noise and therefore a quicker
convergence (Cho et al. [13] in the paper)
How does the expansion influences
convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16
Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representative nodes
lead to a better estimation of
PageRank values in the first
iteration
in the long run, expansions with
the highest number of nodes
present the best convergence
17
Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance”
between the local and global PageRank
only considering information available
in the local graph ?
18
Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local and global ranks.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
19
Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling 

(1%, 5%, 10%, 20%)
homepage
Kendall-tau distance

between ReferrerGraph

and reduced subgraphs
20
Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with nodes with similar degree
Degree (D) : statistics on the degree distribution
Weighted degree (W) : same as degree but considering the
weight on edges (transitions)
Local PageRank (P) : stats on the PageRank values
Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks”
• S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
Predicting Kendall-tau Distance
We compute 62
structural graphs
metrics for each
training instance
Extract Structural Properties of each Graph
21
Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features
~better than using all the features
assortativity : less predictive power 

~too many features and too little training data?
22
Predicting Kendall-tau Distance
Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution
of in- and out- degree:
very straightforward to compute
information alway available in the
local graph
23
YES.

With just few structural properties
features of the of the local graph.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
24
Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” experiment)
or with the most representative nodes

(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local
and global PageRank exploiting the structural properties of
the local graph
25
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
26
Thanks.

Mais conteúdo relacionado

Semelhante a Presentation @SIGIR2015

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfnoureddinebassa1
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAbhishek Mungoli
 
Ranking spatial data by quality preferences ppt
Ranking spatial data by quality preferences  pptRanking spatial data by quality preferences  ppt
Ranking spatial data by quality preferences pptSaurav Kumar
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIsam Al Jawarneh, PhD
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors LuceneSease
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneSease
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceNeo4j
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AIDatabricks
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filtersaman shaheen
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AINeo4j
 

Semelhante a Presentation @SIGIR2015 (20)

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
Ranking spatial data by quality preferences ppt
Ranking spatial data by quality preferences  pptRanking spatial data by quality preferences  ppt
Ranking spatial data by quality preferences ppt
 
Manos
ManosManos
Manos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filter
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Presentation @SIGIR2015

  • 1. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 1
  • 2. “when the centrality-like rank computed on a local graph differ from the ones on the global graph” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 Local Ranking Problem - Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”
 - Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”
 - Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 0.3 0.6 0.3 0.3 0.2 0.4 0.3 0.6 0.2 2
  • 3. The BrowseGraph user session BrowseGraph 3 “a graph where nodes are webpages 
 and edges are browsing transitions” user navigation
 (e.g. Flickr) construction
  • 4. Centrality Metrics applied to the BrowseGraph Increasing popularity in recent years
 - Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”
 - Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”
 - Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling” Provide higher-quality rankings 
 compared to standard hyperlinks graphs
 - Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.” 4
  • 5. Local Ranking Problem on the BrowseGraph WHY? 5
  • 6. Local Ranking Problem on the BrowseGraph WHY? Image Ranking in Flickr in SIGIR 2012 We compared different ranking approaches on the BrowseGraph (PageRank and BrowseRank among others) How much our rank could vary having more information (i.e. nodes)? 6
  • 7. BrowseGraph and ReferrerGraphs ReferrerGraphs: Domain-dependent Browse Graph Construct different BrowseGraphs based 
 on the referrer domain Recommend news articles following the ReferrerGraphs BrowseGraph Twitter ReferrerGraph Facebook ReferrerGraph 7 Can we rely on 
 centrality-based algorithms to infer news importance?
  • 8. Local Ranking Problem on the BrowseGraph Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment) How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph Discover the referrer domain when it is not available 
 (not discussed in the presentation—please see the paper) 8
  • 9. Social Networks Search Engines News Homepage Yahoo News BrowseGraph ~500M pageviews Local Ranking Problem on the BrowseGraph 1. Construct the BrowseGraph (our “global graph”) 2. Construct the ReferrerGraphs (our “local graphs”) 9
  • 10. Very different dimensions Subgraph Comparison Very well connected 
 (also Reddit—the smallest one) 10
  • 11. Cross-distance Kendall-tau among common nodes (min overlap 1k) In general the similarities are very low (<0.3)
 ~different content or different users’ interest Search engines are the most similar (>0.5) Subgraph Comparison 11
  • 12. 1. For each ReferrerGraph 2. Compare the PageRank values with the global one (Kendall-tau) 3. Expand with the next neighborhood of nodes 4. Iterate till the convergence is closer to 1 Growing Rings Experiment Study of the LRP on the BrowseGraph 
 by incrementally expand the local graph K(local+0, global) ~0.307 K(local+1, global) ~0.524 K(local+2, global) ~0.740 K(local+3, global) ~0.912 12
  • 13. Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing) Growing Rings Experiment 13 Same size referrer-based (SRB) to measure the impact of the graph size Random (R) : 7 random graphs reflecting the size of the original RB graphs
  • 15. Growing Rings Experiment 15 same size RGs RandomReferrerGraphs
  • 16. Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper) Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper) How does the expansion influences convergence if only few more representative nodes are selected ? Growing Rings Experiment with Selection of Nodes 16
  • 17. Growing Rings Experiment with Selection of Nodes • 5 • 10 • 30 • 50 • 100 • 100 • 50 • 30 • 10 • 5 fewer more representative nodes lead to a better estimation of PageRank values in the first iteration in the long run, expansions with the highest number of nodes present the best convergence 17
  • 18. Growing Rings Expansion ..with Selected Nodes ~1 or 2 steps can be enough to estimate the PageRank score of the global graph Predicting Kendall-tau Distance Can we estimate the “distance” between the local and global PageRank only considering information available in the local graph ? 18
  • 19. Hypothesis : some structural properties of the graph could be a good proxies for the tau value difference between local and global ranks. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 19
  • 20. Training Set Construction Predicting Kendall-tau Distance ReferrerGraph Jackknife resampling 
 (1%, 5%, 10%, 20%) homepage Kendall-tau distance
 between ReferrerGraph
 and reduced subgraphs 20
  • 21. Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops) • A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications” Predicting Kendall-tau Distance We compute 62 structural graphs metrics for each training instance Extract Structural Properties of each Graph 21
  • 22. Regression Analysis (RF) in a five-fold CV over 10 iterations weighted degree : most predictive features ~better than using all the features assortativity : less predictive power 
 ~too many features and too little training data? 22 Predicting Kendall-tau Distance
  • 23. Predicting Kendall-tau Distance Most importance features in weighted degree : features based on the distribution of in- and out- degree: very straightforward to compute information alway available in the local graph 23
  • 24. YES.
 With just few structural properties features of the of the local graph. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 24
  • 25. Summary How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes
 (“Growing Rings with Selection of Nodes”) It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph 25
  • 26. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 26 Thanks.