O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

3. Relationships Matter: Using Connected Data for Better Machine Learning

Dr. Alicia Frame, Director – Graph Data Science, Neo4j

  • Seja o primeiro a comentar

3. Relationships Matter: Using Connected Data for Better Machine Learning

  1. 1. Relationships Matter: Using Connected Data for Better Machine Learning Spring 2021 DR. ALICIA FRAME Director of Data Science, Neo4j @ODSC #Neo4j @AliciaFrame1
  2. 2. It’s Not What You Know
  3. 3. It’s Who You Know And Where They Are
  4. 4. Photo by Helena Lopes on Unsplash Network Structure is highly predictive of pay and promotions • People Near Structural Holes • Organizational Misfits “Organizational Misfits and the Origins of Brokerage in Intrafirm Networks” A. Kleinbaum “Structural Holes and Good Ideas” R. Burt
  5. 5. But You Can’t Analyse What You Can’t See • Most data science ignores relationships • Graphs are built using relationships • You don’t have to guess at correlations; with graphs the relationships are inherent 5 James Fowler Relationships Are the Strongest Predictors of Behavior David Burkus
  6. 6. 6 Top 10 Tech Trends in Data and Analytics, 16 Feb 2021 According to Garner, “Graphs form the foundation of modern D&A, with capabilities to enhance and improve user collaboration, ML models and explainable AI. The recent Gartner AI in Organizations Survey demonstrates that graph techniques are increasingly prevalent as AI maturity grows, going from 13% adoption when AI maturity is lowest to 48% when maturity is highest.” AI Research Papers Featuring Graph Source: Dimensions Knowledge System +168k downloads since 2019 4x Increase in traffic to Neo4j GDS page in 2H-2020 Analytics & Data Science Interest Exploding in Neo4j Community 3x More Data Scientists in Neo4j database in 2H-2020
  7. 7. 20 of the top 25 financial firms 7 of the top 10 retailers 7 of the top 10 software vendors Neo4j: The Graph Company Neo4j is the creator of: • The world’s leading graph database • The first graph data science platform • The most flexible graph data model • The easiest-to-use graph query language Thousands of Organizations Use Neo4j 7 Silicon Valley London Munich Paris Malmö
  8. 8. Connections in Data are as Valuable as the Data Itself Networks of People Transaction Networks Bought B ou gh t V i e w e d R e t u r n e d Bought Knowledge Networks Pl ay s Lives_in In_sport Likes F a n _ o f Plays_for E.g., Risk management, Supply chain, Payments E.g., Employees, Customers, Suppliers, Partners, Influencers E.g., Enterprise content, Domain specific content, eCommerce content K n o w s Knows Knows K n o w s
  9. 9. 9 What’s a graph? Node ● Represent an entity in the graph ● Can have labels Relationship ● Connect nodes to each other ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, list, ...) Labeled Property Graph
  10. 10. What is Graph Data Science? Rather than just crunching numbers like traditional analytics, Graph Data Science analyzes data relationships and structures... ... to produce answers, insights, and predictions 10
  11. 11. Knowledge Graphs Graph Feature Engineering and Graph ML Graph Analytics, Investigations and Counterfactuals Integrations and Knowledge Graphs for Heuristic AI Capitalize Analysis Data Modeling 11 Graphs Enhance All Phases of Data Science & AI
  12. 12. Query (e.g. Cypher/Python) Real-time, local decisioning and pattern matching Graph Algorithms Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation
  13. 13. Better Predictions with Graphs Using the Data You Already Have • Current data science models ignore network structure • Graphs add highly predictive features to ML models, increasing accuracy • Otherwise unattainable predictions based on relationships 13
  14. 14. Neo4j Graph Data Science Library Neo4j Database Neo4j Bloom Scalable Graph Algorithms & Analytics Workspace Native Graph Creation & Persistence Visual Graph Exploration & Prototyping Neo4j Graph Data Science Framework
  15. 15. From Simple Queries to Advanced ML 15 Human-crafted query, human-readable result MATCH (p1:Person)-[:ENEMY]->(:Person)<-[:ENEMY]-(p2:PERSON) MERGE (p1)-[:FRIEND]->(p2) AI-learned formula, machine-readable result Predefined formula, human-readable result PageRank(Emil) = 13.25 PageRank(Amy) = 4.83 PageRank(Alicia) = 4.75 Node2Vec(Emil) =[5.4 5.1 2.4 4.5 3.1] Node2Vec(Amy) =[2.8 1.8 7.2 0.9 3.0] Node2Vec(Alicia)=[1.4 5.2 4.4 3.9 3.2] Queries Algorithms Embeddings Machine Learning Workflows Train ML models based on results
  16. 16. The Neo4j Graph Data Science Library 50+ robust algorithms in one flexible analytics workspace with supervised ML workflows Pathfinding & Search • Deep path analytics • Optimal routing Centrality & Importance • Identifies importance of distinct nodes • Influencer & risk identification Community Detection • Detects group clustering • Partition options Similarity • Evaluates how alike graph nodes are Graph Embeddings • Learns from structural information • Reduces dimensionality for ML Link Prediction • Estimates likelihood of forming relationship • Estimate missing information 16
  17. 17. A graph embedding is a way of representing each node in your graph as a fixed-length vector. • Preserves key features • Reduces dimensionality • Can be decoded Different techniques may represent different aspects of a graph, and may use different approaches to learn that representation What Are Graph Embeddings? 17
  18. 18. Node2Vec FastRP Random walk through the graph to sample nodes and their properties • Easy to understand • Lots of examples • Interpretable parameters Just tell it how far to walk Project a similarity matrix into lower dimensional space with matrix math • Up to 75,000 x faster than Node2Vec • Equivalent accuracy when tuned • Flexible parameters for tuning Both produce output fixed-length embedding vectors Both must be rerun when new data is added 18
  19. 19. GraphSAGE • Assumes nodes in the same neighborhood have similar representations • Uses node properties in addition to relationships • Inductive approach that learns a function to calculate an embedding Aggregate Sample Predict 19
  20. 20. Graph Embeddings for Feature Engineering 20 Rather than running multiple algorithms to describe specific aspects of your graph topology, embeddings learn a unique representation of what’s important for your graph and your problem, letting you use graph structure as a predictor. Financial Transaction Data
  21. 21. • Neo4j automates data transformations • Fast iterations & layering • Production ready features, parallelization & enterprise support Our Secret Sauce: The Graph Catalog A graph-specific analytics workspace that’s mutable – integrated with a native-graph database Mutable In-Memory Workspace Computational Graph Native Graph Store
  22. 22. 22 Unsupervised ML • Unlabelled data • Data driven • Pattern identification Machine Learning Unsupervised Clustering Dimension Reduction (generalization) Association Data is not labeled at all Data is labeled or categorized Divide by similarity Identify sequences Find hidden dependencies Stack similar clothing Find clothes often worn together Make best outfit from given clothes Supervised Classification Regression Predict a number Predict a category Predict the length of a sock Predict if an outfit is fancy or casual Supervised ML • Labeled data • Task driven • Value predicting
  23. 23. 23 Unsupervised ML • Unlabelled data • Data driven • Pattern identification • Graph Algorithms Machine Learning Unsupervised Clustering Dimension Reduction (generalization) Association Data is not labeled at all Data is labeled or categorized Divide by similarity Identify sequences Find hidden dependencies Which parts of my graph are more connected? Which nodes are most similar? How important is each node? Supervised Classification Regression Predict a number Predict a category Predict the length of a sock Predict if an outfit is fancy or casual Supervised ML • Labeled data • Task driven • Value predicting Community Detection Centrality Embeddings Similarity Pathfinding
  24. 24. 24 Uses labeled data to learn a function to map input data onto outputs. That model can then make predictions on new data. How do you measure if it’s any good? Hold back some labeled data and measure accuracy. cat Dog Labeled Data Model Training Prediction Output [cat] [dog] New Data ? It’s a Cat! ⟮ ⟯ [cat] ⟮ ⟯ [dog] Supervised Machine Learning
  25. 25. 25 Types of Supervised ML in Neo4j Node classification: “What kind of node is this?” Link prediction: “Should there be a relationship between these nodes?” Labeled data: Pairs of nodes that are either linked or not Features: Pre-existing attributes, algorithms (pageRank), embedding
  26. 26. 26 Load your in- memory graph with labels & features Use nodeClassification.train Specify the property you want to predict and the features for making that prediction Train a Node Classification Model in Neo4j Node classification: Predicting a node label or (categorical) property Neo4j Automates the Tricky Parts: 1. Splits data for train & test 2. Builds logistic regression models using the training data & specified parameters to predict the correct label 3. Evaluates the accuracy of the models using the test data 4. Returns the best performing model The predictive model appears in the model catalog, ready to apply to new data
  27. 27. 27 Load your in- memory graph with labels & features Use linkPrediction.train Split your graph into train & test splitRelationships.mutate Train a Link Prediction Model in Neo4j Link Prediction: Predicting unobserved edges or relationships that will form in the future Neo4j Automates the Tricky Parts: 1. Builds logistic regression models using the training data & specified parameters to predict the correct label 2. Evaluates the accuracy of the models using the test data 3. Returns the best performing model The predictive model appears in the model catalog, ready to apply to new data
  28. 28. Machine Learning Models in Neo4j Train a model to make predictions on unseen parts of the graph, or for new data Not a data model — a predictive model Models live in the Neo4j analytics workspace in a model catalog • Contains versioning information What data was this model trained on? • Time stamps • Model names • As of GDS 1.5 models can be published, stored, and loaded from disk. ML Models in the Analytics Workspace 28
  29. 29. Graph-Native Feature Engineering Train Predictive Model Queries Algorithms Embeddings 1. Model Type 2. Property Selection 3. Train & Test 4. Model Selection Graph-Native ML Workflows inside Neo4j Apply Model to Existing / New Data Use Predictions for Decisions Use Predictions to Enhance the Graph Publish & Share Store Model in Database
  30. 30. 30 Uranus is the third biggest planet R&D: Better health outcomes through machine learning on patient journeys Fraud Detection with graph feature engineering + AutoML Analytics to improve reliability by predicting problems in a supply-chain knowledge graph From Simple to Highly Sophisticated Data Science Graphs Accelerate Innovation Analysis Repeatability Analysis Complexity Full Production Simple, Ad Hoc High Analytics Data Science FinServ Customers
  31. 31. Graph Analytics: Improving Reliability Medical device manufacturer with 10.74B annual revenue Manufacture products like pacemakers, stents and heart valves, all the way through diagnostic tests. Integrated development, design, manufacture, and sales. 31 Neo4j GDS for supply chain & issues prediction Simple data model: parts, finished product, and failures • Knowledge Graph to support robust queries • Centrality algorithms to rank nodes based on their proximity to failures, similarity to find vulnerable components • Creating new data from connections in Neo4j Challenge: Predicting and preventing failures • Integrated supply chain: from raw materials to complex devices • Inconsistent analysis, unable to pinpoint cause of failures
  32. 32. Graph R&D: Improving Patient Outcomes Global pharmaceutical with $22.1Billion revenue Focus on oncology, cardiovascular, renal, metabolism, & respiratory 32 Neo4j GDS to map & predict patient journeys • 3 yrs of visits, tests & diagnosis with 10’s of Bn of records • Knowledge Graph, graph queries, algorithms and traditional ML approaches • Extracted paths to train embeddings to predict successful interventions Challenge: Better intervention for complex diseases • Complex diseases develop over years with many touch points • How can we intervene faster & improve outcomes?
  33. 33. Production Graph ML: Build Better Models 33 Neo4j GDS for Feature Engineering + AutoML • Data science platforms help commoditize data science: build scalable, repeatable, and deployable data science tools • Embedding tuning is a major focus & challenge • Feature generation as input for autoML priceline Challenge: Adding predictive relationships to production ML • Every percentage point of model accuracy matters • Graphs are powerful in R&D & PoC models - but putting into production is challenging Several FinServ customers
  34. 34. Neo4j Graph Data Science Library Neo4j Database Neo4j Bloom Scalable Graph Algorithms & Analytics Workspace Native Graph Creation & Persistence Visual Graph Exploration & Prototyping 50+ Graph Algorithms Graph-Native ML Data Scientist Friendly Neo4j Graphs Data Science Framework
  35. 35. Neo4j Graph Data Science 50+ Graph Algorithms More supported algorithms than any other vendor Graph-Native ML Only commercial offering with full graph ML workflows Humane Experience Automatic transformation from storage to analytics and visualization Scalable Data Science Algorithms running over 10’s billions of nodes in production Extensible Integrate with other data sources and ML platforms Strongest Community 220K+ practioners 72K+ meetups 35
  36. 36. 36 Get Started: - Sandbox: https://neo4j.com/sandbox/ - Guides: neo4j.com/developer/graph-data-science/ - GitHub: github.com/neo4j/graph-data-science Books - O’Reilly Book on Graph Algorithms neo4j.com/graph-algorithms-book/ - Graph Data Science For Dummies: neo4j.com/graph-data-science-for-dummies

×