O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Graph Algorithms for Developers

Speakers: Amy Hodler, Graph Analytics and AI Program Manager, Neo4j + Will Lyon, Developer Relations Engineer, Neo4j

  • Seja o primeiro a comentar

Graph Algorithms for Developers

  1. 1. Graph Algorithms For Developers Amy Hodler & William Lyon, @Neo4j Seattle Graph Tour July 2019 bit.ly/seattlealgos
  2. 2. Relationships The Strongest Predictors of Behavior! “Increasingly we're learning that you can make better predictions about people by getting all the information from their friends and their friends’ friends than you can from the information you have about the person themselves” James Fowler David Burkus Albert-László Barabási 3
  3. 3. William Lyon @lyonwj 4 Neo4j Labs Engineer Amy Hodler @amyhodler Analytics & AI Program Manager
  4. 4. Graph Platform ● Database management system (DBMS) ● Property Graph data model ● Cypher query language ● Graph analytics ● Data visualization ● Developer tool for building applications What is Neo4j? neo4j.com/
  5. 5. Labeled Property Graph
  6. 6. Query (e.g. Cypher/Python) Real-time, local decisioning and pattern matching Graph Algorithms Libraries Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation
  7. 7. What Do People Do with Graph Algorithms?
  8. 8. Understand & Predict Complex Behavior Requires Understanding Relationships and Structures Flow & Dynamics Interactions & Resiliency Propagation Pathways
  9. 9. Using Graph Algorithms Explore, Plan, Measure Find significant patterns and plan for optimal structures Score outcomes and set a threshold value for a prediction Feature Engineering for Machine Learning The measures as features to train 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
  10. 10. • Current data science models ignore network structure & complex relationships • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships More Accurate Predictions with the Data You Already Have Machine Learning Pipeline
  11. 11. Financial Crimes Recommendations Cybersecurity Predictive Maintenance Customer Segmentation Churn Prediction Search & MDM Graph Data Science Applications EXAMPLES Drug Discovery 12
  12. 12. Why Not Use Other Types of Analytics?
  13. 13. Structure is Hard to Unfold
  14. 14. 15 NumberofNODES Number of RELATIONSHIPS per Node Average Distribution Most nodes have the same number of relationships “No Network in Nature that we know of that would be described by the Random network model.” –Albert-László Barabási Random Network
  15. 15. 16 NumberofNODES Number of RELATIONSHIPS per Node Power-Law Distribution Some nodes have a lot of relationships but MOST have very few Structured Network
  16. 16. 17 NumberofNODES Number of RELATIONSHIPS per Node Many approaches erroneously focus on the average population where few entities actually exist Graphs help us invest in populous areas Find strategic entities Uncover structural information
  17. 17. Graph Algorithms Extract Structure and Infer Behavior Source: “Communities, modules and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  18. 18. Neo4j Graph Algorithms
  19. 19. Graph Algorithm Categories in Neo4j neo4j.com/ graph-algorithms- book/ Book Signing 3:15 Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  20. 20. Graph Algorithms in Neo4j • Parallel Breadth First Search & DFS • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi- Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Updated June 2019 Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors +35
  21. 21. Demo: Business Reviews Application
  22. 22. Yelp Public Dataset (as a graph) https://yelp.com/dataset
  23. 23. 24 How To Use Graph Algos To Enhance An Application? ● Category hierarchy - knowledge graph ○ Overlap similarity algorithm ● Showing relevant reviews ○ Personalized PageRank ● Personalized recommendations ○ Jaccard similarity ○ Label propagation
  24. 24. 25 Business Reviews Application Neo4j JavaScript Driver
  25. 25. 26 Cypher Query Language
  26. 26. 27 Cypher Query Language average(5,4,2) = 3.67
  27. 27. Neo4j Graph Algorithms Plugin One-click install in Neo4j Desktop … or grab plugin directly https://neo4j.com/download-center/
  28. 28. 1. Call as Cypher procedure 1. Pass in specification (Label, Prop, Query) and configuration 1. stream variant returns (a lot) of results CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score 2. non-stream variant writes results to graph returns statistics CALL algo.<name>('Label','TYPE',{conf}) How To… Pathfinding & Search Centrality / Importance Community Detection Link Prediction Similarity
  29. 29. 30
  30. 30. 31 Find All Categories
  31. 31. Overlap Similarity Algorithm Ideal choice for finding hierarchy in data and developing super and sub- categories Overlap similarity coefficient represents the co-occurrence of items between groups A B A B
  32. 32. Jaccard Similarity Algorithm Often used to find recommendations of similar items as well as part of link prediction Jaccard similarity measures the similarity between sets A B A B
  33. 33. 35 Category Co-Occurence What categories show up together?
  34. 34. 36 Running Overlap Similarity In Neo4j 1) Find category co- occurences 1) Pass to overlap similarity procedure
  35. 35. 37 Find Category Hierarchy
  36. 36. 38 Find Top Level Categories Only return categories without an outgoiong NARROWER_THAN relationship
  37. 37. 39
  38. 38. 40
  39. 39. PageRank & Personalized PageRank Measures the transitive (directional) influence of nodes and considers the influence of neighbors and their neighbors Personalized PageRank CALL algo.pageRank('Page', 'LINKS', {iterations:20, dampingFactor:0.85, sourceNodes: [siteA]})
  40. 40. 44 Using Personalized PageRank To Surface Relevant Reviews ● Find influential reviewers in my network ● Find users who have reviewed the same businesses as me
  41. 41. 45 Inferred Relationships
  42. 42. 46 Projected User-User Graph Leverage inferred relationships. We can run graph algorithms on these projected graphs.
  43. 43. 47 Running Personalized PageRank
  44. 44. 48 Running Personalized PageRank
  45. 45. 49 Running Personalized PageRank
  46. 46. 50 Running Personalized PageRank
  47. 47. 51 Query Using TRUSTS Relationships Order reviews by PageRank score on TRUSTS relationship
  48. 48. 53 Personalized Recommendations Content based vs collaborative filtering ● Photo based recommendations: 1) Similar photos using Jaccard similarity 2) Cluster similar photos using Label Propagation 3) Recommend businesses connected to photos in the same community
  49. 49. Label Propagation Nodes adopt labels based on neighbors to infer clusters IterateIterate Community Detection
  50. 50. 55 Business Photos
  51. 51. 56 Photo Labels Labeled using Google’s Vision API https://cloud.google.com/vision/ https://github.com/moxious/vision-api
  52. 52. 57 Photo Labels
  53. 53. A B A BB Jaccard Similarity
  54. 54. 59 How Similar Are These Photos?
  55. 55. 60 Jaccard Similarity
  56. 56. Label Propagation IterateIterate Community Detection Adds partition property to group photos by community Group by community and count photos per community
  57. 57. Label Propagation Community Detection
  58. 58. Photo Based Recommendations What photos do you like?
  59. 59. 64
  60. 60. A Visual Evaluation of GoT
  61. 61. Enter the NEuler (Algorithms Playground)
  62. 62. install.graphapp.io neo4j.com/developer/ neo4j-desktop
  63. 63. 69 Game of Thrones (TV-Series) • Based on Andrew Beveridge's script to graph work • 400 Nodes (people) • 3,550 Relationships (interactions)
  64. 64. Triangles and Clustering Coefficient Communities Measures can be counted/normalized globally u Triangles = 2 CC= 0.2 Triangle Count determines the number of triangles passing through a node in the graph Clustering Coefficient is the probability that neighbors of a particular node are connected to each other u Triangles = 2 CC= 0.33
  65. 65. Triangles and Clustering Coefficient Communities Use When Basic network analysis, e.g. does the network exhibit small-world structures? Estimating stability Finding structural holes Scoring for machine learning Spam Classification Semi-streaming web page analysis (local triangle and CC) I don’t like spam!
  66. 66. NEuler Demo
  67. 67. A C D E B F G Strongly Connected Components All nodes can reach each other following direction, but not necessarily directly Find cycles, collapse tight communities, estimate similarity in-group Connected Components All nodes can each other when disregarding direction Find disconnected subgraphs or nodes in common, preprocess data Communities
  68. 68. NEuler Demo
  69. 69. Betweenness Centrality Influence Tip / Caution Computationally intensive: use RA Brandes approximation on large graphs. Assumes all communication between nodes happens along the shortest path and with the same frequency (not always the case in real life) The sum of the % shortest paths that pass through a node, calculated by pairs
  70. 70. Betweenness Centrality - Uses Influence Use When Identify bridges Uncover control points Find bottlenecks and vulnerabilities Network Resilience Key points of cascading failure
  71. 71. NEuler Demo
  72. 72. Resources
  73. 73. 80 neo4jsandbox.com
  74. 74. 81 Online Training! Applied Graph Algorithms Data Science With Neo4j https://neo4j.com/graphacademy/online-training/data-science https://neo4j.com/graphacademy/online-training/applied-graph-algorithms
  75. 75. Free Digital Book neo4j.com/ graph-algorithms-book • Spark & Neo4j Examples • Machine Learning Chapter
  76. 76. neo4j.com/online-summit/ 30+ Sessions. Online. October 10.
  77. 77. Questions? Amy Hodler amy.hodler@neo4j.com Will Lyon will@neo4j.com bit.ly/seattlealgos

×