SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
GRAPH ANALYTICS AND
MACHINE LEARNING
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
Mathematics on Graph
• An abstract representation of a set of
entities where some pairs are connected
by links;
 Entity (Vertex,
Node)
 Link ( Edge,
Relationship)
What is Graph?
Constructing of Graph
Graph Affinity Matrix
Graph Laplacian Matrix
Update Function on Graph
Magic of Properties of Laplacian Matrix
What is a Graph Database?
• A Database with an Explicit Graph
Structure;
• Each Node Knows its Adjacent Nodes;
• As the Number of Nodes Increases, the
Cost of a Local Step Remains the Same,
O(n);
• An Index for Lookups;
Relational Model vs Graph Model
Optimized for Aggregation Optimized for Connections
RDBMS
SQL vs NOSQL
Complexity
Big Table
Column Family
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
Performance Comparison
Value in Relationships
Low High
Key-Value
Why Graph Databases?
K V
BigTable
K V V V V
Document
Relational
Graph

NoSQL and Big Data
14
• Traditional databases handle big data sets, too.
But, more on structure data;
• NoSQL databases have poor analytics;
• HDFS, MapReduce often works from text files;
• NoSQL is more for high throughput, basically,
AP from the CAP theorem, instead of CP;
• In practice, Big Data is likely to be a mix of text
files, NoSQL, and SQL RDBMS;
Graph Terminology
• Graph Computation(Analytics):
o Whole graph is processed, typically for several
iterations  vertex-centric computation.
o Examples: Belief Propagation, Pagerank,
Community detection, Triangle Counting,
Matrix Factorization, Machine Learning…
• Graph Database (Queries):
o Selective graph queries (compare to SQL
queries)
o Traversals: shortest-path, friends-of-friends,…
15
GRAPH ANALYTICS
What Graph Can Model?
Graphs are Essential to ML
• Identify influential people and information;
• Discover communities;
• Understand people’s interests in common;
• Model complex real life data dependencies;
It’s all about GRAPH: The Value of Data is Proportional
to the Number of Meaningful Relationships!
Complex Big Data Graph ML Algorithms
Graph Social Network Model
Model can be easily used in real life applications for customer
classification, profiling, segmentation and product
recommendations.
Identifying Key People
Social Network Tie Recommendation
Full Stack Graph ML Algorithms
Typical Graph Analytics
Graph Analytics - Page Rank
• PageRank, is about the
importance of nodes in
GRAPH – Link Analysis,
which is defined as the
probability falling into
node depending on:
 The probability
landing onto one of
the node’s neighbor;
 The probability
crossing the link
from neighbor to it;
o Identify the influential
leader;
Graph Analytics - Triangle Count
• Clustering coefficient (CC) is a
measure of the degree to which
nodes in a graph tend to cluster
together;
• Calculation of CC can be tuned to
counting the number of triangles
around one particular node in the
graph;
• CC indicates the degree to which a
node’s neighbors are themselves
neighbors;
• CC of a graph is closely related to the
transitivity of a graph;
Graph Analytics - Connected Components
• Connected component is a subgraph in which any
two vertices are connected and no additional
vertices connected to the supergraph;
• A graph is strongly connected if every vertex is
reachable from other vertices. The strongly
connected components form a partition into
subgraphs that are themselves strongly connected;
• A spanning tree is a subgraph of the original graph,
which connect all the vertexes that where originally
connected;
• A minimum spanning tree (mst) is a spanning tree
such that the sum of the weights of its edges is not
greater than the sum of the edges of any other
spanning tree;
Graph Analytics - Betweenness centrality
• Betweenness centrality is an
indicator of a node's centrality in
a network, which is equal to the
number of shortest paths from
all vertices to all others that pass
through that node;
• A node with high betweenness
centrality has a large influence
on the transfer of items through
the network;
• Betweenness centrality is related
to a network's connectivity;
Graph Social Media Recommendation
Graph Computing Opportunity
Combining with the leading tools such as Graph
Database, Machine Learning, High Performance
Computing, Clustering, Streaming, Graph
Computing Technology is ready to take off in Big
Data Era!
Distributed Graph Analytics System
How to Construct Graph?
Graph ETL Data Flow
Graph ETL Example
Graph ETL Architecture

Mais conteúdo relacionado

Mais procurados

Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
From NEURON to NULON
From NEURON to NULONFrom NEURON to NULON
From NEURON to NULON
healis
 

Mais procurados (20)

Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
 
From NEURON to NULON
From NEURON to NULONFrom NEURON to NULON
From NEURON to NULON
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
 
Data analysis
Data analysisData analysis
Data analysis
 
Graph analytics in Linkurious Enterprise
Graph analytics in Linkurious EnterpriseGraph analytics in Linkurious Enterprise
Graph analytics in Linkurious Enterprise
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
QuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA RapidsQuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA Rapids
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 

Destaque

MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
Tim Juravich
 
Geometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse MatrixGeometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse Matrix
Jun Saito
 
Graph Consensus: A Review
Graph Consensus: A ReviewGraph Consensus: A Review
Graph Consensus: A Review
adas2327
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
Graph power point chestnut 2014 ccu
Graph power point chestnut 2014 ccuGraph power point chestnut 2014 ccu
Graph power point chestnut 2014 ccu
kimchestnutgc
 

Destaque (20)

MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
 
Geometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse MatrixGeometry Processingで学ぶSparse Matrix
Geometry Processingで学ぶSparse Matrix
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Graph Consensus: A Review
Graph Consensus: A ReviewGraph Consensus: A Review
Graph Consensus: A Review
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage Lending
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101
 
Intro au Big Data & Machine Learning
Intro au Big Data & Machine LearningIntro au Big Data & Machine Learning
Intro au Big Data & Machine Learning
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine Learning
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Graph power point chestnut 2014 ccu
Graph power point chestnut 2014 ccuGraph power point chestnut 2014 ccu
Graph power point chestnut 2014 ccu
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 

Semelhante a Graph analytic and machine learning

Semelhante a Graph analytic and machine learning (20)

Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
 
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade AnalyticsGraph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational database
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Graph Database and Why it is gaining traction
Graph Database and Why it is gaining tractionGraph Database and Why it is gaining traction
Graph Database and Why it is gaining traction
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Data visualization
Data visualizationData visualization
Data visualization
 

Mais de Stanley Wang

Mais de Stanley Wang (15)

Sparql a simple knowledge query
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge query
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Ontology model and owl
Ontology model and owlOntology model and owl
Ontology model and owl
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Next generation big data bi
Next generation big data biNext generation big data bi
Next generation big data bi
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Data analytics as a service
Data analytics as a serviceData analytics as a service
Data analytics as a service
 
Distributed machine learning examples
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examples
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Big data analytic market opportunity
Big data analytic market opportunityBig data analytic market opportunity
Big data analytic market opportunity
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Graph analytic and machine learning

  • 1. GRAPH ANALYTICS AND MACHINE LEARNING STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2. Mathematics on Graph • An abstract representation of a set of entities where some pairs are connected by links;  Entity (Vertex, Node)  Link ( Edge, Relationship)
  • 8. Magic of Properties of Laplacian Matrix
  • 9. What is a Graph Database? • A Database with an Explicit Graph Structure; • Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the Cost of a Local Step Remains the Same, O(n); • An Index for Lookups;
  • 10. Relational Model vs Graph Model Optimized for Aggregation Optimized for Connections
  • 11. RDBMS SQL vs NOSQL Complexity Big Table Column Family Size Key-Value Store Document Databases Graph Databases 90% of Use Cases Relational Databases
  • 13. Value in Relationships Low High Key-Value Why Graph Databases? K V BigTable K V V V V Document Relational Graph 
  • 14. NoSQL and Big Data 14 • Traditional databases handle big data sets, too. But, more on structure data; • NoSQL databases have poor analytics; • HDFS, MapReduce often works from text files; • NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP; • In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;
  • 15. Graph Terminology • Graph Computation(Analytics): o Whole graph is processed, typically for several iterations  vertex-centric computation. o Examples: Belief Propagation, Pagerank, Community detection, Triangle Counting, Matrix Factorization, Machine Learning… • Graph Database (Queries): o Selective graph queries (compare to SQL queries) o Traversals: shortest-path, friends-of-friends,… 15
  • 17. What Graph Can Model?
  • 18. Graphs are Essential to ML • Identify influential people and information; • Discover communities; • Understand people’s interests in common; • Model complex real life data dependencies; It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!
  • 19. Complex Big Data Graph ML Algorithms
  • 20. Graph Social Network Model Model can be easily used in real life applications for customer classification, profiling, segmentation and product recommendations.
  • 22. Social Network Tie Recommendation
  • 23. Full Stack Graph ML Algorithms
  • 25. Graph Analytics - Page Rank • PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on:  The probability landing onto one of the node’s neighbor;  The probability crossing the link from neighbor to it; o Identify the influential leader;
  • 26. Graph Analytics - Triangle Count • Clustering coefficient (CC) is a measure of the degree to which nodes in a graph tend to cluster together; • Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph; • CC indicates the degree to which a node’s neighbors are themselves neighbors; • CC of a graph is closely related to the transitivity of a graph;
  • 27. Graph Analytics - Connected Components • Connected component is a subgraph in which any two vertices are connected and no additional vertices connected to the supergraph; • A graph is strongly connected if every vertex is reachable from other vertices. The strongly connected components form a partition into subgraphs that are themselves strongly connected; • A spanning tree is a subgraph of the original graph, which connect all the vertexes that where originally connected; • A minimum spanning tree (mst) is a spanning tree such that the sum of the weights of its edges is not greater than the sum of the edges of any other spanning tree;
  • 28. Graph Analytics - Betweenness centrality • Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node; • A node with high betweenness centrality has a large influence on the transfer of items through the network; • Betweenness centrality is related to a network's connectivity;
  • 29. Graph Social Media Recommendation
  • 30. Graph Computing Opportunity Combining with the leading tools such as Graph Database, Machine Learning, High Performance Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big Data Era!
  • 31.
  • 33.