SlideShare a Scribd company logo
1 of 33
Download to read offline
Representation Learning on Graphs with
Complex Structures
Prof. Dr. Philippe Cudré-Mauroux
eXascale Infolab, U. of Fribourg–Switzerland
DL4G-SDE @ WWW2019
San Francisco, May 13, 2019
Representation Learning on Graphs
■ Projecting nodes of a graph onto a vector space while preserving key
structural properties of the graph (e.g., topological proximity of the nodes)
8/5/192 WWW2019@San Francisco
Neural embedding
techniques
(e.g.word2vec)
…
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31
1
Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." In Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pp. 701-710. ACM, 2014.
DeepWalk1
8/5/193 WWW2019@San Francisco
What if the graph at hand exhibits
a much more complex structure?
Outlines
■ JUST: Embedding heterogeneous graphs without meta-paths
[CIKM’18]
■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs
[WWW’19]
■ NodeSketch: Highly-efficient graph embeddings via recursive
sketching [KDD’19]
8/5/194 WWW2019@San Francisco
Heterogeneous Graphs
■ Heterogeneous Graphs contain multiple node types:
● Homogeneous edges: linking nodes from the same domain
● Heterogeneous edges: linking nodes across different domains
8/5/195 WWW2019@San Francisco
Meta-Paths in Heterogeneous Graphs
■ A meta-path is a sequence of node types encoding key composite relations among the
involved node types.
■ Meta-paths are used to guide random walks to redefine the neighborhood of a node.
8/5/196 WWW2019@San Francisco
1
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
Metapath2vec1
Neural embedding
techniques
(e.g.word2vec)
…
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31
Challenges with Meta-Paths
■ The choice of meta-paths highly affects the quality of the learnt node
embeddings for a specific task.
■ How to select meta-paths ?
● Graph specific and highly depends on prior knowledge from domain experts.
● Strategies to combine a set of meta-paths can be complex and computationally
expensive.
8/5/197 WWW2019@San Francisco
Are meta-paths necessary?
8/5/198 WWW2019@San Francisco
JUST: Embedding Heterogeneous Graphs without Meta-Paths
■ Random Walk with JUmp and STay strategies to probabilistically control the
random walk.
■ 2 ways to balance the random walk:
● Step I: Jump or stay?
−Objective: Balance the number of heterogeneous and homogeneous edges traversed during
random walks (stay with probability 𝝰, exponential decay).
● Step II: If Jump, where to Jump?
−Objective: Control the randomness in choosing a target domain
(memory window to favor diversity).
■ Learn node embeddings with SkipGram model.
8/5/199 WWW2019@San Francisco
Results
8/5/1910 WWW2019@San Francisco
JUST achieves state-of-the-art performance without using meta-paths.
Node classification results
Runtime Performance
■ End-to-end node embedding learning time for all random-walk based
methods in seconds.
8/5/1911 WWW2019@San Francisco
DBLP Movie Foursquare
DeepWalk 236 333 484
Metapath2vec (original) 965 19,200 2,248
Metapath2vec (ours) 290 408 550
Hin2vec 904 1,301 1,801
JUST 310 442 616
• Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time, but achieves
better results in classification and clustering tasks.
• Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better results in most
experiments.
Outlines
■ JUST: Embedding heterogeneous graphs without meta-paths
[CIKM’18]
■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs
[WWW’19]
■ NodeSketch: Highly-efficient graph embeddings via recursive
sketching [KDD’19]
8/5/1912 WWW2019@San Francisco
Social Relationships v.s. Human Mobility
8/5/1913 WWW2019@San Francisco
8/5/1914 WWW2019@San Francisco
How to quantify the impact of social relationships and
mobility on each other?
● Two types of links
−Friendships
−Check-ins (Hyperedges)
Location Based Social Networks
■A hypergraph with
● Four data domains
8/5/1915 WWW2019@San Francisco
Spatial
- POI
Temporal
- Time slot
Semantic
- Activity category
Social
- User
Hypergraph Embedding
8/5/1916 WWW2019@San Francisco
0.19 0.32 1.89 1.21 0.87
0.67 0.45 1.76 1.42 0.98
1.32 0.77 1.11 1.29 1.31
045 0.89 1.56 0.02 0.79
…
Graph embedding
Neural embedding
techniques
(e.g. SkipGram)
1. How to sample from a
LBSN hypergraph?
2. How to preserve n-wise
proximity from Hyperedges?
1. Sample from A Hypergraph: Random Walk with Stay
■ Balancing the impact of social and mobility on the learnt embeddings
8/5/1917 WWW2019@San Francisco
Sample and learn from
• A check-in hyperedge with probability 𝛼
• A user-user pair with probability (1-𝛼)
2. Learn from Hyperedges: Learning via Best-Fit-Line
■ Maximizing the similarity between the nodes of a hyperedge and their
best-fit-line under cosine similarity.
8/5/1918 WWW2019@San Francisco
1. Compute the best-fit-line
2. Maximize the cosine similarity between each node
and the best-fit-line
Task I: Friendship Prediction
■ Comparison with other graph embedding techniques
● (S) Social network only
● (S&M) Social and mobility through clique expansion
8/5/1919 WWW2019@San Francisco
↑ 32.95% on
precision@10
Clique expansion
Task II: Location Prediction
■ Comparison with other graph embedding techniques
● (M) Mobility (Check-in) network only
● (S&M) Social and mobility through clique expansion
8/5/1920 WWW2019@San Francisco
↑ 25.32% on
accuracy@10
8/5/19 WWW2019@San Francisco21
Balancing the Impact of Social Relationships and Mobility Matters!
Asymmetric impact of mobility and social relationships on predicting each other:
• Friendship prediction: 80% social and 20% mobility data
• Location prediction: 60% social and 40% mobility data
Outlines
■ JUST: Embedding heterogeneous graphs without meta-paths
[CIKM’18]
■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs
[WWW’19]
■ NodeSketch: Highly-efficient graph embeddings via recursive
sketching [KDD’19]
8/5/1922 WWW2019@San Francisco
Graph Embeddings
■ Graph-sampling based techniques
● Sample node pairs from a graph, and preserve node proximity from the node pairs
● Examples: DeepWalk, Node2Vec, LINE, SDNE and VERSE, etc.
● Efficiency bottleneck: A large number of node pairs -> significant computation resources (CPU time)
■ Factorization based techniques
● Factorize a (transformed, e.g., high-order) proximity/adjacency matrix of a graph
● Examples: GraRep, HOPE and NetMF, etc.
● Efficiency bottleneck: Large matrix factorization -> significant computation resources (both CPU time and
RAM)
■ Node proximity preserved using cosine similarity
● Efficiency bottleneck: cosine similarity is less efficient than hamming similarity, for example.
8/5/1923 WWW2019@San Francisco
Similarity-Preserving Hashing/Sketching
■ Efficient similarity approximation of high dimensional data
● Data-dependent hashing (learning-to-hash)
−Learning dataset-specific hashing functions
−Examples: spectral hashing, iterative quantization, etc.
−Efficient in similarity computation, but requires learning hashing functions
● Data-independent hashing/sketching (locality sensitive hashing)
−Hashing without involving any learning process from data
−Examples: minhash, consistent weighted sampling, etc.
−Efficient in both similarity approximation and hashing
8/5/1924 WWW2019@San Francisco
Can we sketch nodes in a graph as embeddings?
8/5/1925 WWW2019@San Francisco
Preliminary: Consistent Weighted Sampling1
■ Principled techniques for highly-efficient similarity approximation
8/5/1926 WWW2019@San Francisco
The min-max similarity
between original data
Can be approximated by the
Hamming similarity between
sketches
1.32 2.77 1.11 3.29 1.31V
Sketch S = S1 … Sj … SL
D=5 Random hash
function hj , j=1…,L.
1
Dingqi Yang, Bin Li, Rettig Laura, Philippe Cudré-Mauroux, D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms,
IEEE Transactions on Knowledge and Data Engineering (TKDE) 2018
Sketching the Adjacency Matrix ?
■ Adjacency matrix v.s. Self-Loop-Augmented (SLA) adjacency matrix
8/5/1927 WWW2019@San Francisco
NodeSketch: Low-Order Node Embeddings
8/5/1928 WWW2019@San Francisco
1
2
3
4 5
NodeSketch: High-Order Node Embeddings
8/5/1929 WWW2019@San Francisco
1 1
0.33 0.33 0.33
Neighbors
𝒏 ∈ 𝜞 𝒓
Node 2 2 3 1
SLA adjacency vector '𝑽 𝒓
Sketch element distribution
𝟏
𝑳
∑𝒋-𝟏
𝑳
𝕝[𝑺 𝒋
𝒏
𝒌2𝟏 -𝒊], 𝑖=1,..,D
1.066 1.066 0.066
Approximate 𝑘-order
SLA adjacency vector '𝑽 𝒓
(𝒌)
node 1
Sketching using Eq. 3
*Weight
α=0.2
Merge
1 1
1 1 1
1 1 1 1
1 1
1 1
SLA adjacency
matrix '𝑨
2 1 1
2 3 1
2 3 4
4 3 4
5 3 5
(𝑘-1)-order node
embeddings 𝑺(𝒌 − 𝟏)
𝑘-order
embeddings 𝑺(𝒌)
2 1 3
2 3 4
2 3 4
2 3 4
4 3 5
(𝑘-1)-order Sketches
𝑺 𝒏
(𝒌 − 𝟏)
… … …
Uniformity of the generated samples:
The foundation of our recursive sketching process
1
2
3
4 5
Results: Node Classification Performance using Kernel SVM
8/5/1930 WWW2019@San Francisco
Classical graph
embedding techniques
(preserving cosine
similarity)
Learning-to-hash
techniques
Sketching
techniques
NodeSketch shows comparable performance to the best-performing state-of-the-art techniques.
Results: Runtime Performance
8/5/1931 WWW2019@San Francisco
NodeSketch is highly-efficient, and significantly
outperforms all baselines, showing 9x-273x speedup.
Hamming similarity also shows improved efficiency (1.19x-
1.68x speedup) over cosine similarity.
Take-Away Messages
■ JUST: Meta-path free heterogeneous graph embedding can achieve state-
of-the-art performance efficiently. [CIKM’18]
■ LBSN2Vec: Asymmetric impact of social and mobility on each other
[WWW’19]
■ NodeSketch: High-quality node embeddings can be generated via highly-
efficient sketching techniques [KDD’19]
8/5/1932 WWW2019@San Francisco
[CIKM’18] Hussein, Rana, Dingqi Yang, and Philippe Cudré-Mauroux. "Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings." CIKM’18.
[WWW’19] Dingqi Yang, Bingqing Qu, Jie Yang, Philippe Cudre-Mauroux, ”Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach.” WWW’19.
[KDD’19] Dingqi Yang, Paolo Rosso, Bin Li and Philippe Cudre-Mauroux, “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching.” KDD’19.
Future Plan for Representation Learning on Graphs
■ Attributed graph structure (e.g., property graphs)
■ Heterogeneous data structures (e.g., structured knowledge graph + unstructured text)
■ Dynamic graphs (e.g., streaming LBSN graphs)
4/29/19 Dingqi's job talk @ University of Luxembourg33

More Related Content

What's hot

Border Gateway Protocol
Border Gateway ProtocolBorder Gateway Protocol
Border Gateway Protocol
Kashif Latif
 

What's hot (20)

IPv6 in the Telco Cloud and 5G
IPv6 in the Telco Cloud and 5GIPv6 in the Telco Cloud and 5G
IPv6 in the Telco Cloud and 5G
 
Tutorial on SDN and OpenFlow
Tutorial on SDN and OpenFlowTutorial on SDN and OpenFlow
Tutorial on SDN and OpenFlow
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Cisco Application Centric Infrastructure
Cisco Application Centric InfrastructureCisco Application Centric Infrastructure
Cisco Application Centric Infrastructure
 
Software Defined Network (SDN)
Software Defined Network (SDN)Software Defined Network (SDN)
Software Defined Network (SDN)
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer
 
5G Network: Requirements, Design Principles, Architectures, and Enabling Tech...
5G Network: Requirements, Design Principles, Architectures, and Enabling Tech...5G Network: Requirements, Design Principles, Architectures, and Enabling Tech...
5G Network: Requirements, Design Principles, Architectures, and Enabling Tech...
 
LPWAN technology overview
LPWAN technology overviewLPWAN technology overview
LPWAN technology overview
 
Best practices in solving PNT threats in critical defense communications infr...
Best practices in solving PNT threats in critical defense communications infr...Best practices in solving PNT threats in critical defense communications infr...
Best practices in solving PNT threats in critical defense communications infr...
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Campas network design overview
Campas network design overviewCampas network design overview
Campas network design overview
 
Border Gateway Protocol
Border Gateway ProtocolBorder Gateway Protocol
Border Gateway Protocol
 
3GPP SON Series: Minimization of Drive Testing (MDT)
3GPP SON Series: Minimization of Drive Testing (MDT)3GPP SON Series: Minimization of Drive Testing (MDT)
3GPP SON Series: Minimization of Drive Testing (MDT)
 
6G Training Course Part 9: Course Summary & Conclusion
6G Training Course Part 9: Course Summary & Conclusion6G Training Course Part 9: Course Summary & Conclusion
6G Training Course Part 9: Course Summary & Conclusion
 
82159587 case-study-on-corba
82159587 case-study-on-corba82159587 case-study-on-corba
82159587 case-study-on-corba
 
Graph database
Graph database Graph database
Graph database
 
Iot rpl
Iot rplIot rpl
Iot rpl
 
Beginners: Open RAN Terminology – Virtualization, Disaggregation & Decomposition
Beginners: Open RAN Terminology – Virtualization, Disaggregation & DecompositionBeginners: Open RAN Terminology – Virtualization, Disaggregation & Decomposition
Beginners: Open RAN Terminology – Virtualization, Disaggregation & Decomposition
 
Service Function Chaining with SRv6
Service Function Chaining with SRv6Service Function Chaining with SRv6
Service Function Chaining with SRv6
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 

Similar to Representation Learning on Complex Graphs

Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
Janna Joceli Omena
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 

Similar to Representation Learning on Complex Graphs (20)

Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
PointNet
PointNetPointNet
PointNet
 
Euro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataEuro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street data
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
 
20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
Portfolio
PortfolioPortfolio
Portfolio
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 

More from eXascale Infolab

HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Representation Learning on Complex Graphs

  • 1. Representation Learning on Graphs with Complex Structures Prof. Dr. Philippe Cudré-Mauroux eXascale Infolab, U. of Fribourg–Switzerland DL4G-SDE @ WWW2019 San Francisco, May 13, 2019
  • 2. Representation Learning on Graphs ■ Projecting nodes of a graph onto a vector space while preserving key structural properties of the graph (e.g., topological proximity of the nodes) 8/5/192 WWW2019@San Francisco Neural embedding techniques (e.g.word2vec) … 0.19 0.32 1.89 1.21 0.87 0.67 0.45 1.76 1.42 0.98 1.32 0.77 1.11 1.29 1.31 1 Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701-710. ACM, 2014. DeepWalk1
  • 3. 8/5/193 WWW2019@San Francisco What if the graph at hand exhibits a much more complex structure?
  • 4. Outlines ■ JUST: Embedding heterogeneous graphs without meta-paths [CIKM’18] ■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs [WWW’19] ■ NodeSketch: Highly-efficient graph embeddings via recursive sketching [KDD’19] 8/5/194 WWW2019@San Francisco
  • 5. Heterogeneous Graphs ■ Heterogeneous Graphs contain multiple node types: ● Homogeneous edges: linking nodes from the same domain ● Heterogeneous edges: linking nodes across different domains 8/5/195 WWW2019@San Francisco
  • 6. Meta-Paths in Heterogeneous Graphs ■ A meta-path is a sequence of node types encoding key composite relations among the involved node types. ■ Meta-paths are used to guide random walks to redefine the neighborhood of a node. 8/5/196 WWW2019@San Francisco 1 Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144. Metapath2vec1 Neural embedding techniques (e.g.word2vec) … 0.19 0.32 1.89 1.21 0.87 0.67 0.45 1.76 1.42 0.98 1.32 0.77 1.11 1.29 1.31
  • 7. Challenges with Meta-Paths ■ The choice of meta-paths highly affects the quality of the learnt node embeddings for a specific task. ■ How to select meta-paths ? ● Graph specific and highly depends on prior knowledge from domain experts. ● Strategies to combine a set of meta-paths can be complex and computationally expensive. 8/5/197 WWW2019@San Francisco
  • 8. Are meta-paths necessary? 8/5/198 WWW2019@San Francisco
  • 9. JUST: Embedding Heterogeneous Graphs without Meta-Paths ■ Random Walk with JUmp and STay strategies to probabilistically control the random walk. ■ 2 ways to balance the random walk: ● Step I: Jump or stay? −Objective: Balance the number of heterogeneous and homogeneous edges traversed during random walks (stay with probability 𝝰, exponential decay). ● Step II: If Jump, where to Jump? −Objective: Control the randomness in choosing a target domain (memory window to favor diversity). ■ Learn node embeddings with SkipGram model. 8/5/199 WWW2019@San Francisco
  • 10. Results 8/5/1910 WWW2019@San Francisco JUST achieves state-of-the-art performance without using meta-paths. Node classification results
  • 11. Runtime Performance ■ End-to-end node embedding learning time for all random-walk based methods in seconds. 8/5/1911 WWW2019@San Francisco DBLP Movie Foursquare DeepWalk 236 333 484 Metapath2vec (original) 965 19,200 2,248 Metapath2vec (ours) 290 408 550 Hin2vec 904 1,301 1,801 JUST 310 442 616 • Compared to DeepWalk and Metapath2vec, JUST has minor overhead on learning time, but achieves better results in classification and clustering tasks. • Compared to Hin2vec, JUST achieves 3x speedup learning time, and achieves better results in most experiments.
  • 12. Outlines ■ JUST: Embedding heterogeneous graphs without meta-paths [CIKM’18] ■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs [WWW’19] ■ NodeSketch: Highly-efficient graph embeddings via recursive sketching [KDD’19] 8/5/1912 WWW2019@San Francisco
  • 13. Social Relationships v.s. Human Mobility 8/5/1913 WWW2019@San Francisco
  • 14. 8/5/1914 WWW2019@San Francisco How to quantify the impact of social relationships and mobility on each other?
  • 15. ● Two types of links −Friendships −Check-ins (Hyperedges) Location Based Social Networks ■A hypergraph with ● Four data domains 8/5/1915 WWW2019@San Francisco Spatial - POI Temporal - Time slot Semantic - Activity category Social - User
  • 16. Hypergraph Embedding 8/5/1916 WWW2019@San Francisco 0.19 0.32 1.89 1.21 0.87 0.67 0.45 1.76 1.42 0.98 1.32 0.77 1.11 1.29 1.31 045 0.89 1.56 0.02 0.79 … Graph embedding Neural embedding techniques (e.g. SkipGram) 1. How to sample from a LBSN hypergraph? 2. How to preserve n-wise proximity from Hyperedges?
  • 17. 1. Sample from A Hypergraph: Random Walk with Stay ■ Balancing the impact of social and mobility on the learnt embeddings 8/5/1917 WWW2019@San Francisco Sample and learn from • A check-in hyperedge with probability 𝛼 • A user-user pair with probability (1-𝛼)
  • 18. 2. Learn from Hyperedges: Learning via Best-Fit-Line ■ Maximizing the similarity between the nodes of a hyperedge and their best-fit-line under cosine similarity. 8/5/1918 WWW2019@San Francisco 1. Compute the best-fit-line 2. Maximize the cosine similarity between each node and the best-fit-line
  • 19. Task I: Friendship Prediction ■ Comparison with other graph embedding techniques ● (S) Social network only ● (S&M) Social and mobility through clique expansion 8/5/1919 WWW2019@San Francisco ↑ 32.95% on precision@10 Clique expansion
  • 20. Task II: Location Prediction ■ Comparison with other graph embedding techniques ● (M) Mobility (Check-in) network only ● (S&M) Social and mobility through clique expansion 8/5/1920 WWW2019@San Francisco ↑ 25.32% on accuracy@10
  • 21. 8/5/19 WWW2019@San Francisco21 Balancing the Impact of Social Relationships and Mobility Matters! Asymmetric impact of mobility and social relationships on predicting each other: • Friendship prediction: 80% social and 20% mobility data • Location prediction: 60% social and 40% mobility data
  • 22. Outlines ■ JUST: Embedding heterogeneous graphs without meta-paths [CIKM’18] ■ LBSN2Vec: Embedding heterogeneous hypergraphs from LBSNs [WWW’19] ■ NodeSketch: Highly-efficient graph embeddings via recursive sketching [KDD’19] 8/5/1922 WWW2019@San Francisco
  • 23. Graph Embeddings ■ Graph-sampling based techniques ● Sample node pairs from a graph, and preserve node proximity from the node pairs ● Examples: DeepWalk, Node2Vec, LINE, SDNE and VERSE, etc. ● Efficiency bottleneck: A large number of node pairs -> significant computation resources (CPU time) ■ Factorization based techniques ● Factorize a (transformed, e.g., high-order) proximity/adjacency matrix of a graph ● Examples: GraRep, HOPE and NetMF, etc. ● Efficiency bottleneck: Large matrix factorization -> significant computation resources (both CPU time and RAM) ■ Node proximity preserved using cosine similarity ● Efficiency bottleneck: cosine similarity is less efficient than hamming similarity, for example. 8/5/1923 WWW2019@San Francisco
  • 24. Similarity-Preserving Hashing/Sketching ■ Efficient similarity approximation of high dimensional data ● Data-dependent hashing (learning-to-hash) −Learning dataset-specific hashing functions −Examples: spectral hashing, iterative quantization, etc. −Efficient in similarity computation, but requires learning hashing functions ● Data-independent hashing/sketching (locality sensitive hashing) −Hashing without involving any learning process from data −Examples: minhash, consistent weighted sampling, etc. −Efficient in both similarity approximation and hashing 8/5/1924 WWW2019@San Francisco
  • 25. Can we sketch nodes in a graph as embeddings? 8/5/1925 WWW2019@San Francisco
  • 26. Preliminary: Consistent Weighted Sampling1 ■ Principled techniques for highly-efficient similarity approximation 8/5/1926 WWW2019@San Francisco The min-max similarity between original data Can be approximated by the Hamming similarity between sketches 1.32 2.77 1.11 3.29 1.31V Sketch S = S1 … Sj … SL D=5 Random hash function hj , j=1…,L. 1 Dingqi Yang, Bin Li, Rettig Laura, Philippe Cudré-Mauroux, D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms, IEEE Transactions on Knowledge and Data Engineering (TKDE) 2018
  • 27. Sketching the Adjacency Matrix ? ■ Adjacency matrix v.s. Self-Loop-Augmented (SLA) adjacency matrix 8/5/1927 WWW2019@San Francisco
  • 28. NodeSketch: Low-Order Node Embeddings 8/5/1928 WWW2019@San Francisco 1 2 3 4 5
  • 29. NodeSketch: High-Order Node Embeddings 8/5/1929 WWW2019@San Francisco 1 1 0.33 0.33 0.33 Neighbors 𝒏 ∈ 𝜞 𝒓 Node 2 2 3 1 SLA adjacency vector '𝑽 𝒓 Sketch element distribution 𝟏 𝑳 ∑𝒋-𝟏 𝑳 𝕝[𝑺 𝒋 𝒏 𝒌2𝟏 -𝒊], 𝑖=1,..,D 1.066 1.066 0.066 Approximate 𝑘-order SLA adjacency vector '𝑽 𝒓 (𝒌) node 1 Sketching using Eq. 3 *Weight α=0.2 Merge 1 1 1 1 1 1 1 1 1 1 1 1 1 SLA adjacency matrix '𝑨 2 1 1 2 3 1 2 3 4 4 3 4 5 3 5 (𝑘-1)-order node embeddings 𝑺(𝒌 − 𝟏) 𝑘-order embeddings 𝑺(𝒌) 2 1 3 2 3 4 2 3 4 2 3 4 4 3 5 (𝑘-1)-order Sketches 𝑺 𝒏 (𝒌 − 𝟏) … … … Uniformity of the generated samples: The foundation of our recursive sketching process 1 2 3 4 5
  • 30. Results: Node Classification Performance using Kernel SVM 8/5/1930 WWW2019@San Francisco Classical graph embedding techniques (preserving cosine similarity) Learning-to-hash techniques Sketching techniques NodeSketch shows comparable performance to the best-performing state-of-the-art techniques.
  • 31. Results: Runtime Performance 8/5/1931 WWW2019@San Francisco NodeSketch is highly-efficient, and significantly outperforms all baselines, showing 9x-273x speedup. Hamming similarity also shows improved efficiency (1.19x- 1.68x speedup) over cosine similarity.
  • 32. Take-Away Messages ■ JUST: Meta-path free heterogeneous graph embedding can achieve state- of-the-art performance efficiently. [CIKM’18] ■ LBSN2Vec: Asymmetric impact of social and mobility on each other [WWW’19] ■ NodeSketch: High-quality node embeddings can be generated via highly- efficient sketching techniques [KDD’19] 8/5/1932 WWW2019@San Francisco [CIKM’18] Hussein, Rana, Dingqi Yang, and Philippe Cudré-Mauroux. "Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings." CIKM’18. [WWW’19] Dingqi Yang, Bingqing Qu, Jie Yang, Philippe Cudre-Mauroux, ”Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach.” WWW’19. [KDD’19] Dingqi Yang, Paolo Rosso, Bin Li and Philippe Cudre-Mauroux, “NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching.” KDD’19.
  • 33. Future Plan for Representation Learning on Graphs ■ Attributed graph structure (e.g., property graphs) ■ Heterogeneous data structures (e.g., structured knowledge graph + unstructured text) ■ Dynamic graphs (e.g., streaming LBSN graphs) 4/29/19 Dingqi's job talk @ University of Luxembourg33