SlideShare uma empresa Scribd logo
1 de 55
Iterative Graph Computation in
the Big Data Era
Wenlei Xie
B-Exam
Committee:
Johannes Gehrke (Chair), David Bindel, Robert Kleinberg, Alan Demers
1
Ubiquitous Graph Data
22
Social Networks Web
Recommendation Systems
Computer VisionBioinformatics
Physical Simulations
Ubiquitous Graph Data
33
Social Networks Web
Recommendation Systems
Computer VisionBioinformatics
Physical Simulations
New Challenges in Big Data Era
My Work
• Fast Iterative Graph Computation with Block Updates
W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14)
• Dynamic Interaction Graph with Probabilistic Edge Decay
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015
• Edge-Weighted Personalized PageRank:
Breaking A Decade-Old Performance Barrier
W. Xie, D. Bindel, A. Demers, J. Gehrke.
Accepted by KDD 2015
To-be-Updated Vertices Dependent Vertices Unrelated Vertices
Block Boundary
(a) Vertex-Oriented Computation (b) Block-Oriented Computation
5 years ago now
Alice
Bob
Carol
1 month ago
My Work
• Fast Iterative Graph Computation with Block Updates
W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14)
• Dynamic Interaction Graph with Probabilistic Edge Decay
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015
• Edge-Weighted Personalized PageRank:
Breaking A Decade-Old Performance Barrier
W. Xie, D. Bindel, A. Demers, J. Gehrke.
Accepted by KDD 2015
To-be-Updated Vertices Dependent Vertices Unrelated Vertices
Block Boundary
(a) Vertex-Oriented Computation (b) Block-Oriented Computation
5 years ago now
Alice
Bob
Carol
1 month ago
Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
6
Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
7
PageRank
• PageRank model
– A random walker moves in the graph
– At each step
• Move to an adjacent node (with prob. ), or
• Teleport to a new node (with prob. )
• PageRank vector: stationery vector for this process
8
PageRank
• PageRank model
– A random walker moves in the graph
– At each step
• Move to an adjacent node (with prob. ), or
• Teleport to a new node (with prob. )
• PageRank vector: stationery vector for this process
9
Transition
Matrix
PageRank
vector
Teleport
vector
Graphs with Rich Metadata
10
• Edge-weighted personalized PageRank
Personalized PageRank
• Node-weighted personalized PageRank
11
• Edge-weighted personalized PageRank
– ObjectRank [Balmin+05] / PopRank [Nie+05]
– TwitterRank [Weng+10]
– Learning to Rank [BackstromL11]
Personalized PageRank
• Node-weighted personalized PageRank
– Topic-Sensitive PageRank (TSPR) [Haveliwala02]
– Localized PageRank [Bahmani+10]
12
Usually a small number of
global parameters (e.g. 5-10)
ObjectRank on DBLP
13
Paper Index Selection
for OLAP
Paper Data Cube: A
Relational Aggregation
Operator…
Forum ICDE
Paper Modeling
Multidimensional
DatabasesConference
ICDE 1997
Author Rakesh
Agrawal
Paper Range Queries
in OLAP Data Cubes
cites
contains
contains
has instance
writes
writes
cites
cites
• Edge-weighted personalized PageRank
– ObjectRank / PopRank
– TwitterRank
– Learning to Rank
Personalized PageRank
• Node-weighted personalized PageRank
– Topic-Sensitive PageRank (TSPR)
– Localized PageRank
14
Question: Which way to personalize?
Answer: Largely depends on whether the
metadata is associated with vertex or edge.
Personalized PageRank
• Node-weighted personalized PageRank
– Efficient algorithms exploiting the structure of v
• Linearity based on parameter w
• Sparsity
15
• Edge-weighted personalized PageRank
– NO Efficient algorithm for general graphs
• No linearity based on w
Edge Personalization Computation
• Ad-hoc algorithms for special graphs / specific
application
– ObjectRank [Balmin+05] / ScaleRank [Hristidis+14]
– Only applies to a limited type of graphs
• Hybrid strategy that linearly combines pre-computed
PageRank vector
– TwitterRank [Weng+10]
• Computing the parameter vector offline
– Many learning-to-rank applications [Nie+05, BackstromL11]
Edge Personalization Computation
• Ad-hoc algorithms for special graphs / specific
application
– ObjectRank / ScaleRank
– Only applies to a limited type of graphs
• Hybrid strategy that linearly combines pre-computed
PageRank vector
– TwitterRank
• Computing the parameter vector offline
– Many learning-to-rank applications
Can we efficiently compute
edge-weighted personalized
PageRank online?
Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
18
Model Reduction
• Used in physical simulations
• Key assumption: solutions live in a low-
dimensional space
• Two ingredients
– Offline: Finding a basis for the space (POD/SVD)
– Online: Finding an approximation
19
Model Reduction for PageRank
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
20
Model Reduction for PageRank
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
21
Reduced Space Construction
• Assumption: lies close to a low-dimensional space
• Compute a sample set of PageRank vectors
22
Reduced Space Construction
• Assumption: lies close to a low-dimensional space
• Compute a sample set of PageRank vectors
• Find a basis for a k-dimensional space based on samples
– Data matrix
– Compute the SVD
here ,
– The best k-dimensional space under 2-norm
– Keep most important directions
23
Model Reduction
• Assumption: lies close to a low-dimensional space
– Build a basis for k-dimensional reduced space
• Pick an approximation in the reduced space
– Represented by the coordinates in the k-dimensional space
– Need k equations
• Reconstruct the PageRank vector
24
Denoted by Denoted by b
Extracting Approximations
• Reduced space basis U, online query w
• We want
–
– Usually
25
Extracting Approximations
• Reduced space basis U, online query w
• We want
– Usually
• The Petrov-Galerkin framework [Schiders08]
– Residual vector is orthogonal to the test space W
26
The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
– when
27
DEIM
28
• Satisfy a subset of equations in the linear system
– Can choose more than k equations
– Over-determined linear system
• Least square solution
The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
– when
29
The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
30
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
31
The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
32
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
Transition Matrix
• How is determined by w?
– First form the weighted adjacency matrix
• E.g.
– Normalize outgoing weights to be probabilities
33
1
23
3
2 3
0.25
0.20.3
0.75
0.2 0.3
Transition Matrix
• How is determined by w?
– First form the weighted adjacency matrix
• E.g.
– Normalize outgoing weights to be probabilities
• Bubnov-Galerkin: Too expensive to compute
• DEIM: NOT ENOUGH to just compute
incoming edge weights
34
1
23
3
2 3
Special Case: Linear Parameterization
• Linear Parameterization
– Each edge has one of the m different types
– A generalized random walker model
• First decide the type of edge to follow (according to w)
• Then decides between edges of that type (according to )
35
Special Case: Linear Parameterization
• Linear Parameterization
– Each edge has one of the m different types
– A generalized random walker model
• First decide the type of edge to follow (according to w)
• Then decides between edges of that type (according to )
• Bubnov-Galerkin
36
Special Case: Scaled-Linear Parameterization
• Scaled-Linear Parameterization
– Choose each edge weight as a linear combination of edge
feature
• E.g. post similarities between users in Twitter
– DEIM: Enough to compute incoming edge weights
37
The Petrov-Galerkin Framework
• Bubnov-Galerkin
– The test space is the same as the reduced space
–
• Discrete Empirical Interpolation Method (DEIM)
– Satisfy a subset of equations
– Denote the index set for equations as
–
38
What is the efficiency of these two
choices of test space?
How to choose the equations used by
DEIM?
Interpolation Set
• How should we choose the subset of equations?
– “Important” nodes according to PageRank
– Does not always work!
Interpolation Set
• We want rows are maximally linearly independent
– Pivoted QR
Interpolation Set
• We want rows are maximally linearly independent
– Pivoted QR
• DEIM: materialize only the selected rows
– Performance is decided by in-degree of selected nodes
– Skewed degree distribution in natural graphs
– A small set of nodes have large in-degrees
Utility vs. Cost
High-Level idea for Pivoted QR
Repeat for times
Select the next row with maximum utility
Adjust the utilities of other rows
• Idea 1: Among low-cost nodes, select one with maximum
utility
– Cost-bounded pivot
• Idea 2: Among high-utility nodes, select one with
minimal cost
– Threshold pivot
42
Learning to Rank
• Goal: Learn the best values of the parameters
– Based on user feedback, historic activities, etc
• Training Data
– Each pair : i should be ranked lower than j
– Objective Function
– Usually minimized via gradient-based method
43
Derivative of PageRank vector
The PageRank Derivative
• Standard Method
– Solves the same PageRank systems with different RHS
– With m parameters, solve m+1 PageRank systems !
• Compute the derivatives in the reduced space
– Solves the system with dimension k instead of dimension n !
44
Outline
• Introduction and Motivation
• Model Reduction
• Application to Personalized PageRank
• Experiments
45
Experiments
• Datasets
– DBLP
• 3.5M vertices, 18.5M edges, 7 parameters
• ObjectRank
– Weibo graph
• 2M vertices, 50.6M edges
• A social-blogging site in China, released by KDD Cup 2012
• Metrics
– Normalized L1
•
– Kendall’s tau
• The percentage of pairs that are out of order
46
Global PageRank on DBLP
47
Learning to Rank on DBLP
48
Method Standard Bubnov-
Galerkin
DEIM-200
Time(sec) 159.3 0.002 0.033
Avg Running Time per Opt. Iteration
Localized PageRank on Weibo
49
10 Parameters
Localized PageRank on Weibo
50
10 Parameters
Conclusion
• The first general scalable method for edge-weighted
personalized PageRank
– Based on model reduction
• Optimizations for common parameterization
• Cost/accuracy tradeoffs on power-law graphs
• Nearly 5 orders of magnitude faster on a learning to
rank application
51
Acknowledgement
52
Acknowledgement
53
Questions?
54
Reference
• [Balmin+05] A. Balmin, et al. ObjectRank: Authority-Based Keyword Search
in Databases. In VLDB, 2004.
• [Nie+05] Z. Nie, et al. Object-level ranking: bringing order to web objects.
In WWW, 2005.
• [Haveliwala02] T. H. Haveliwala. Topic-sensitive PageRank. In WWW, 2002.
• [Bahmani+10] B. Bahmani, et al. Fast incremental and personalized
pagerank. PVLDB, 4(3):173–184, 2010.
• [Weng+10] J. Weng, et al. TwitterRank: finding topic-sensitive influential
twitterers. In WSDM, 2010.
• [BackstromL11] L. Backstrom and J. Leskovec. Supervised random walks:
predicting and recommending links in social networks. In WSDM, 2011.
• [Hristidis+14] V. Hristidis, et al. Efficient ranking on entity graphs with
personalized relationships. IEEE Trans. Knowl. Data Eng., 26(4):850–863,
2014.
• [Schiders08] W. Schilders. Model Order Reduction: Theory, Research
Aspects and Applications, Volume 13 of Mathematics in Industry. Springer,
Berlin, 2008.
55

Mais conteúdo relacionado

Mais procurados

ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Mapsivaderivader
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
clique-summary
clique-summaryclique-summary
clique-summaryJia Wang
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
Network Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture TransformationNetwork Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture TransformationJoonsangYu2
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
 
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow OptimizerMartin Homik
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesNECST Lab @ Politecnico di Milano
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Dongmin Choi
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graphHarry Potter
 

Mais procurados (20)

ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Maps
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
clique-summary
clique-summaryclique-summary
clique-summary
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
Network Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture TransformationNetwork Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture Transformation
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow Optimizer
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
 

Semelhante a Iterative Graph Computation in the Big Data Era

Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimizationWBUTTUTORIALS
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxgamingzonedead880
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)hani_abdeen
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptxThAnhonc
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...Wei Lu
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareMohammed Kharma
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classificationWael Badawy
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISCOGS Presentations
 

Semelhante a Iterative Graph Computation in the Big Data Era (20)

Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
 
Data mining 2004
Data mining 2004Data mining 2004
Data mining 2004
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Contour Forest
Contour Forest Contour Forest
Contour Forest
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
chapter3part1.ppt
chapter3part1.pptchapter3part1.ppt
chapter3part1.ppt
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 

Último

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Iterative Graph Computation in the Big Data Era

  • 1. Iterative Graph Computation in the Big Data Era Wenlei Xie B-Exam Committee: Johannes Gehrke (Chair), David Bindel, Robert Kleinberg, Alan Demers 1
  • 2. Ubiquitous Graph Data 22 Social Networks Web Recommendation Systems Computer VisionBioinformatics Physical Simulations
  • 3. Ubiquitous Graph Data 33 Social Networks Web Recommendation Systems Computer VisionBioinformatics Physical Simulations New Challenges in Big Data Era
  • 4. My Work • Fast Iterative Graph Computation with Block Updates W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14) • Dynamic Interaction Graph with Probabilistic Edge Decay W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015 • Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier W. Xie, D. Bindel, A. Demers, J. Gehrke. Accepted by KDD 2015 To-be-Updated Vertices Dependent Vertices Unrelated Vertices Block Boundary (a) Vertex-Oriented Computation (b) Block-Oriented Computation 5 years ago now Alice Bob Carol 1 month ago
  • 5. My Work • Fast Iterative Graph Computation with Block Updates W. Xie, G. Wang, D. Bindel, A. Demers, J. Gehrke. PVLDB 6(14) • Dynamic Interaction Graph with Probabilistic Edge Decay W. Xie, Y. Tian, Y. Sismanis, A. Balmin, P. J. Haas. ICDE 2015 • Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier W. Xie, D. Bindel, A. Demers, J. Gehrke. Accepted by KDD 2015 To-be-Updated Vertices Dependent Vertices Unrelated Vertices Block Boundary (a) Vertex-Oriented Computation (b) Block-Oriented Computation 5 years ago now Alice Bob Carol 1 month ago
  • 6. Outline • Introduction and Motivation • Model Reduction • Application to Personalized PageRank • Experiments 6
  • 7. Outline • Introduction and Motivation • Model Reduction • Application to Personalized PageRank • Experiments 7
  • 8. PageRank • PageRank model – A random walker moves in the graph – At each step • Move to an adjacent node (with prob. ), or • Teleport to a new node (with prob. ) • PageRank vector: stationery vector for this process 8
  • 9. PageRank • PageRank model – A random walker moves in the graph – At each step • Move to an adjacent node (with prob. ), or • Teleport to a new node (with prob. ) • PageRank vector: stationery vector for this process 9 Transition Matrix PageRank vector Teleport vector
  • 10. Graphs with Rich Metadata 10
  • 11. • Edge-weighted personalized PageRank Personalized PageRank • Node-weighted personalized PageRank 11
  • 12. • Edge-weighted personalized PageRank – ObjectRank [Balmin+05] / PopRank [Nie+05] – TwitterRank [Weng+10] – Learning to Rank [BackstromL11] Personalized PageRank • Node-weighted personalized PageRank – Topic-Sensitive PageRank (TSPR) [Haveliwala02] – Localized PageRank [Bahmani+10] 12 Usually a small number of global parameters (e.g. 5-10)
  • 13. ObjectRank on DBLP 13 Paper Index Selection for OLAP Paper Data Cube: A Relational Aggregation Operator… Forum ICDE Paper Modeling Multidimensional DatabasesConference ICDE 1997 Author Rakesh Agrawal Paper Range Queries in OLAP Data Cubes cites contains contains has instance writes writes cites cites
  • 14. • Edge-weighted personalized PageRank – ObjectRank / PopRank – TwitterRank – Learning to Rank Personalized PageRank • Node-weighted personalized PageRank – Topic-Sensitive PageRank (TSPR) – Localized PageRank 14 Question: Which way to personalize? Answer: Largely depends on whether the metadata is associated with vertex or edge.
  • 15. Personalized PageRank • Node-weighted personalized PageRank – Efficient algorithms exploiting the structure of v • Linearity based on parameter w • Sparsity 15 • Edge-weighted personalized PageRank – NO Efficient algorithm for general graphs • No linearity based on w
  • 16. Edge Personalization Computation • Ad-hoc algorithms for special graphs / specific application – ObjectRank [Balmin+05] / ScaleRank [Hristidis+14] – Only applies to a limited type of graphs • Hybrid strategy that linearly combines pre-computed PageRank vector – TwitterRank [Weng+10] • Computing the parameter vector offline – Many learning-to-rank applications [Nie+05, BackstromL11]
  • 17. Edge Personalization Computation • Ad-hoc algorithms for special graphs / specific application – ObjectRank / ScaleRank – Only applies to a limited type of graphs • Hybrid strategy that linearly combines pre-computed PageRank vector – TwitterRank • Computing the parameter vector offline – Many learning-to-rank applications Can we efficiently compute edge-weighted personalized PageRank online?
  • 18. Outline • Introduction and Motivation • Model Reduction • Application to Personalized PageRank • Experiments 18
  • 19. Model Reduction • Used in physical simulations • Key assumption: solutions live in a low- dimensional space • Two ingredients – Offline: Finding a basis for the space (POD/SVD) – Online: Finding an approximation 19
  • 20. Model Reduction for PageRank • Assumption: lies close to a low-dimensional space – Build a basis for k-dimensional reduced space • Pick an approximation in the reduced space – Represented by the coordinates in the k-dimensional space – Need k equations • Reconstruct the PageRank vector 20
  • 21. Model Reduction for PageRank • Assumption: lies close to a low-dimensional space – Build a basis for k-dimensional reduced space • Pick an approximation in the reduced space – Represented by the coordinates in the k-dimensional space – Need k equations • Reconstruct the PageRank vector 21
  • 22. Reduced Space Construction • Assumption: lies close to a low-dimensional space • Compute a sample set of PageRank vectors 22
  • 23. Reduced Space Construction • Assumption: lies close to a low-dimensional space • Compute a sample set of PageRank vectors • Find a basis for a k-dimensional space based on samples – Data matrix – Compute the SVD here , – The best k-dimensional space under 2-norm – Keep most important directions 23
  • 24. Model Reduction • Assumption: lies close to a low-dimensional space – Build a basis for k-dimensional reduced space • Pick an approximation in the reduced space – Represented by the coordinates in the k-dimensional space – Need k equations • Reconstruct the PageRank vector 24 Denoted by Denoted by b
  • 25. Extracting Approximations • Reduced space basis U, online query w • We want – – Usually 25
  • 26. Extracting Approximations • Reduced space basis U, online query w • We want – Usually • The Petrov-Galerkin framework [Schiders08] – Residual vector is orthogonal to the test space W 26
  • 27. The Petrov-Galerkin Framework • Bubnov-Galerkin – The test space is the same as the reduced space – • Discrete Empirical Interpolation Method (DEIM) – Satisfy a subset of equations – Denote the index set for equations as – when 27
  • 28. DEIM 28 • Satisfy a subset of equations in the linear system – Can choose more than k equations – Over-determined linear system • Least square solution
  • 29. The Petrov-Galerkin Framework • Bubnov-Galerkin – The test space is the same as the reduced space – • Discrete Empirical Interpolation Method (DEIM) – Satisfy a subset of equations – Denote the index set for equations as – when 29
  • 30. The Petrov-Galerkin Framework • Bubnov-Galerkin – The test space is the same as the reduced space – • Discrete Empirical Interpolation Method (DEIM) – Satisfy a subset of equations – Denote the index set for equations as – 30 What is the efficiency of these two choices of test space? How to choose the equations used by DEIM?
  • 31. Outline • Introduction and Motivation • Model Reduction • Application to Personalized PageRank • Experiments 31
  • 32. The Petrov-Galerkin Framework • Bubnov-Galerkin – The test space is the same as the reduced space – • Discrete Empirical Interpolation Method (DEIM) – Satisfy a subset of equations – Denote the index set for equations as – 32 What is the efficiency of these two choices of test space? How to choose the equations used by DEIM?
  • 33. Transition Matrix • How is determined by w? – First form the weighted adjacency matrix • E.g. – Normalize outgoing weights to be probabilities 33 1 23 3 2 3 0.25 0.20.3 0.75 0.2 0.3
  • 34. Transition Matrix • How is determined by w? – First form the weighted adjacency matrix • E.g. – Normalize outgoing weights to be probabilities • Bubnov-Galerkin: Too expensive to compute • DEIM: NOT ENOUGH to just compute incoming edge weights 34 1 23 3 2 3
  • 35. Special Case: Linear Parameterization • Linear Parameterization – Each edge has one of the m different types – A generalized random walker model • First decide the type of edge to follow (according to w) • Then decides between edges of that type (according to ) 35
  • 36. Special Case: Linear Parameterization • Linear Parameterization – Each edge has one of the m different types – A generalized random walker model • First decide the type of edge to follow (according to w) • Then decides between edges of that type (according to ) • Bubnov-Galerkin 36
  • 37. Special Case: Scaled-Linear Parameterization • Scaled-Linear Parameterization – Choose each edge weight as a linear combination of edge feature • E.g. post similarities between users in Twitter – DEIM: Enough to compute incoming edge weights 37
  • 38. The Petrov-Galerkin Framework • Bubnov-Galerkin – The test space is the same as the reduced space – • Discrete Empirical Interpolation Method (DEIM) – Satisfy a subset of equations – Denote the index set for equations as – 38 What is the efficiency of these two choices of test space? How to choose the equations used by DEIM?
  • 39. Interpolation Set • How should we choose the subset of equations? – “Important” nodes according to PageRank – Does not always work!
  • 40. Interpolation Set • We want rows are maximally linearly independent – Pivoted QR
  • 41. Interpolation Set • We want rows are maximally linearly independent – Pivoted QR • DEIM: materialize only the selected rows – Performance is decided by in-degree of selected nodes – Skewed degree distribution in natural graphs – A small set of nodes have large in-degrees
  • 42. Utility vs. Cost High-Level idea for Pivoted QR Repeat for times Select the next row with maximum utility Adjust the utilities of other rows • Idea 1: Among low-cost nodes, select one with maximum utility – Cost-bounded pivot • Idea 2: Among high-utility nodes, select one with minimal cost – Threshold pivot 42
  • 43. Learning to Rank • Goal: Learn the best values of the parameters – Based on user feedback, historic activities, etc • Training Data – Each pair : i should be ranked lower than j – Objective Function – Usually minimized via gradient-based method 43 Derivative of PageRank vector
  • 44. The PageRank Derivative • Standard Method – Solves the same PageRank systems with different RHS – With m parameters, solve m+1 PageRank systems ! • Compute the derivatives in the reduced space – Solves the system with dimension k instead of dimension n ! 44
  • 45. Outline • Introduction and Motivation • Model Reduction • Application to Personalized PageRank • Experiments 45
  • 46. Experiments • Datasets – DBLP • 3.5M vertices, 18.5M edges, 7 parameters • ObjectRank – Weibo graph • 2M vertices, 50.6M edges • A social-blogging site in China, released by KDD Cup 2012 • Metrics – Normalized L1 • – Kendall’s tau • The percentage of pairs that are out of order 46
  • 48. Learning to Rank on DBLP 48 Method Standard Bubnov- Galerkin DEIM-200 Time(sec) 159.3 0.002 0.033 Avg Running Time per Opt. Iteration
  • 49. Localized PageRank on Weibo 49 10 Parameters
  • 50. Localized PageRank on Weibo 50 10 Parameters
  • 51. Conclusion • The first general scalable method for edge-weighted personalized PageRank – Based on model reduction • Optimizations for common parameterization • Cost/accuracy tradeoffs on power-law graphs • Nearly 5 orders of magnitude faster on a learning to rank application 51
  • 55. Reference • [Balmin+05] A. Balmin, et al. ObjectRank: Authority-Based Keyword Search in Databases. In VLDB, 2004. • [Nie+05] Z. Nie, et al. Object-level ranking: bringing order to web objects. In WWW, 2005. • [Haveliwala02] T. H. Haveliwala. Topic-sensitive PageRank. In WWW, 2002. • [Bahmani+10] B. Bahmani, et al. Fast incremental and personalized pagerank. PVLDB, 4(3):173–184, 2010. • [Weng+10] J. Weng, et al. TwitterRank: finding topic-sensitive influential twitterers. In WSDM, 2010. • [BackstromL11] L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, 2011. • [Hristidis+14] V. Hristidis, et al. Efficient ranking on entity graphs with personalized relationships. IEEE Trans. Knowl. Data Eng., 26(4):850–863, 2014. • [Schiders08] W. Schilders. Model Order Reduction: Theory, Research Aspects and Applications, Volume 13 of Mathematics in Industry. Springer, Berlin, 2008. 55

Notas do Editor

  1. Good morning everyone, welcome to my B exam.
  2. Graphs are versatile tools used to express complex data dependencies. Social networks and the web are two best-known examples. Besides them, graphs have also been adopted in a variety of other domains, such as recommendation systems, bioinformatics, physical simulations, and computer vision.
  3. Iterative graph computation is a classic problem, but new challenges arises in this big data era.
  4. Here are the three projects I did during my graduate study to address the challenges of iterative graph computation in the big data era.
  5. In today’s talk, I will discuss this recent work that enables interactive edge-weighted personalized PageRank.
  6. Let’s first introduce and motivate the problem.
  7. I believe most you already know the PageRank algorithm, which is originally proposed by Google to rank web pages. But this algorithm has been widely used for many graph-ranking applications. Let me try to briefly recap the model and define some terminology. In PageRank, a random walker moves through the nodes in the graph. At each step, the mover either walks alone the edge, or it says no, I am bored and I want restart to an arbitrary vertex according to some distributions.
  8. Here P is the transition matrix, basically defines how the random walker moves alone the edge. It can move alone the edge uniformly or prefer some type of the edges. v is the so-called teleport vector that defines a distribution. When the random walker is bored, it will tell which vertex to jump to. For example, it can jump to small subset of vertices. Or will it just select the vertex uniformly among the whole graph. x is the PageRank vectors, what we are interested in.
  9. The graph usually doesn’t only contain those topology data, but also other data associated with vertex and edges. Such as user profile, edge types.. PageRank algorithm usually exploits the rich metadata on the graphs. As it allows personalized results based on user preference. For example, on a microblogging website such as Twitter, user would like to find authorities on different topics. In one query he is asking what’s the authority on food? In another query he would like to identify the expert on music.
  10. To personalize the PageRank result, here are the two ideas. Both of them parameterize part of the PageRank system. The first way is the node-weighted personalized PageRank, basically we decide teleport distribution based on the query parameter w. Intuitively, it marks some of the nodes are more related to the query. The second way is the edge-weighted personalized PageRank. The transition matrix P is changed according to the parameter w. Intuitively, it says, the random walker want to go through some edges more often than others.
  11. Both types of personalization have many applications. For NodePPR, one example is TSPR. It assumes different nodes belongs to different topics, such as sports, entertainment, music. And it changes the teleport distribution according to the user’s query. Localized PageRank is another type of NodePPR, where it always jump to a single node, which is used in recommendation systems. EdgePPR are suited for graphs with metadata on edges. For example, ObjectRank use the edge type in the DBLP graph to personalize the transition matrix. TwitterRank exploits the topic similarities between friends.
  12. Here is the example of ObjectRank on DBLP. As we can see, DBLP graph has different edge types between nodes; author writes paper, paper cites other paper. ObjectRank allows changing the relative weights between different edge types. In one query, the user might think the citation edge is most important; in another query, user might say the conference and author is a strong indicator of a great paper, so let’s give these orange and green edges higher weights.
  13. We have introduced these two very different schemes for personalization. The question is… The answer is, it depends, especially whether the metadata is with vertex or edge. For some graphs, we know the topic of vertices, it’s suitable for node personalization (e.g. TSPR). But for some graphs have metadata associated with edge, such as edge type, feature vectors. These graphs best fit to use edge personalization. For example, the ObjectRank. Facebook has also reported that edge features such as communication frequency, visit time can significantly improve the friend recommendation result via EdgePPR.
  14. As we discussed, there are two types of personalized PageRank, each of them is applied to a family of graphs. People are looking for efficient methods to compute them. There are many efficient methods for Node PPR, usually based on the linearity of the parameter w. There is no efficient algorithms for edge-weighted personalized PageRank on general graphs. The difficulty here is, in edge-weighted personalized PageRank, the PageRank vector x doesn’t depend on the parameter w linearly, unlike the case in node-weighted personalized PageRank. -- hacking or first-order approx (TwitterRank) / on special graphs (ScaleRank) / Do it offline (All learning to rank)
  15. Since Edge-PPR has many applications, people have tried hard to work around this. Strictly speaking, it’s not generate the “correct result”. It’s kind of generating something between NodePPR and EdgePPR. For most learning-to-rank applications, people just compute the parameter vector offline, because they cannot afford to do it online.
  16. Since Edge-PPR has many applications, people have tried hard to work around this. Strictly speaking, it’s not generate the “correct result”. It’s kind of generating something between NodePPR and EdgePPR. For most learning-to-rank applications, people just compute the parameter vector offline, because they cannot afford to do it online.
  17. We didn’t invent it. To quickly get solutions from parametric large-scale PDE systems.
  18. To locate the coordinate, we need to determine k variables. This is trivial by computing the linear combination of basis vectors according to the coordinates.
  19. How do we construct this reduced space?
  20. We construct the low-dimensional space in a data-driven way, first we need to compute a sample set of PageRank vectors. We currently did nothing special here. Basically we just sample the parameters uniformly. And compute these PageRank vectors. There are many possible extension. We might want to use more sophisticated sampling method, such as adaptive sampling. But uniform sampling works well for the applications we tested.
  21. With the sampled PageRank vectors, we can construct the reduced space via SVD. More specifically, we will first assemble these vectors into a data matrix After the SVD, the best k-dimensional space under 2-norm can be constructed by use the first k singular vector.
  22. M(w) for the system matrix B for the RHS
  23. A linear combination of the basis vectors, and it’s closed to the accurate PageRank vector. For the accurate PageRank vector, we have the residual vector M x – b = 0
  24. Nevertheless, we can still enforce the residual vector being orthogonal to some test space W. Although we cannot make this residual vector to be zero, we can make its projection to some test space to be zero.
  25. We also call this index set the “interpolation set” When the size of interpolation set is k. It fits into the Petrov Galerkin framework with this test space Pi. This matrix contains all zero except a few entries on the diagonal.
  26. Here is a little more note about the DEIM method. As we said, it tries to satisfy a subset of equations in the huge linear system. We can actually choose more than k equations for more accurate result, and in this case, it’s an over-determined linear system, and we will solve the least square problem instead.
  27. We have introduced two general choice of test space.
  28. Is one test space always better than another? Or there is some efficiency/accuracy tradeoff? We said in the DEIM method, we want to choose a subset of equations, how should we choose it?
  29. Those are questions specific to the PageRank applications, and the graphs we are using.
  30. To answer this question, we need to investigate how the transition matrix formed.
  31. I didn’t say anything about how P(w) is determined. We just assume there is some black box… It’s now the time to introduce it  One popular choice is through a inner product with w and the edge feature vector. After we have the weight on each edge, we know some edges are important and some are not.
  32. For this very general way of determine the transition matrix, Bubnov Galerkin is not efficient. Even for the DEIM method, since we need to transition probability for each incoming edges of the selected nodes. To get the incoming transition prob for the light blue guy. We need to compute the edge weights for both purple edges and orange edges. There are some common used transition matrix where we can do better…
  33. Transition matrix is determined via a linear combination of type-specific transition matrix.
  34. Thanks to the linearity, now the reduced system can be expanded. Each of the component can be precomputed and combined linearly later. This combination is linear addition of several 100*100.
  35. Transition matrix is no longer linear combination, but the weighted adjacent matrix A is.
  36. Move forward
  37. By some error analysis, we found the guideline to select the interpolation set is that make the rows selected in MU maximally linearly independent
  38. We tried two heuristics. One favors the running time and another favors the accuracy.
  39. Like a typical machine learning problem, it defines some loss function and try to find the best parameter to the optimization problem.
  40. We are testing on 1000 randomly selected nodes. So we plot the cumulative frequency of the Kendal distance. We compare our methods with BCA, the state-of-art algorithm for localized PageRank.
  41. Thank my advisor Johannes for all the helpful suggestions. You have not only taught me knowledge in CS, but also how to find good research problems, how to think as a researcher. Thank David Bindel for all the discussion on the projects we collaborated. Your suggestions from mathmatic and scientific computing are invaluable. Thank Al Demers for the advice from system perspective, which helped me understand experimental result. Thank Bobby Kleinberg for constructive suggestions and also inspiring lectures on algorithm design and analysis.