SlideShare a Scribd company logo
1 of 41
Download to read offline
Dynamics in Graph Analysis
Adding Time as Structure for Visual
and Statistical Insight
Benjamin Bengfort
@bbengfort
District Data Labs
Are graphs effective for analytics?
Or why use graphs at all?
Algorithm Performance
More understandable implementations
and native parallelism provide benefits
particularly to machine learning.
Visual Analytics
Humans can understand and interpret
interconnection structures, leading to
immediate insights.
β€œGraph technologies ease the modeling of
your domain and improve the simplicity
and speed of your queries.”
β€” Marko A. Rodriguez
http://bit.ly/2cthd2L
Construction
Given a set of [paths,
vertices] is a [constraint]
graph construction
possible?
Existence
Does there exist a [path,
vertex, set] within
[constraints]?
Optimization
Given several [paths,
subgraphs, vertices, sets] is
one the best?
Enumeration
How many [vertices, edges]
exist with [constraints], is it
possible to list them?
Traversals
Property Graphs
How do you model time?
Relational Database
Time Properties
Time Modifies Traversal
Example of Time Filtered Traversal: Data Model
Name: Emails Sent Network
Number of nodes: 6,174
Number of edges: 343,702
Average degree: 111.339
def sent_range(g, before=None, after=None):
# Create filtering function based on date range.
def inner(edge):
if before:
return g.ep.sent[edge] < before
if after:
return g.ep.sent[edge] > after
return inner
def degree_filter(degree=0):
# Create filtering function based on min degree.
def inner(vertex):
return vertex.out_degree() > degree
return inner
Example of Time Filtered Traversal
print("{} vertices and {} edges".format(
g.num_vertices(), g.num_edges()
))
# 6174 vertices and 343702 edges
aug = sent_range(g,
after=dateparse("Aug 1, 2016 09:00:00 EST")
)
view = gt.GraphView(g, efilt=aug)
view = gt.GraphView(view, vfilt=degree_filter())
print("{} vertices and {} edges".format(
view.num_vertices(), view.num_edges()
))
# 853 vertices and 24813 edges
Example of Time Filtered Traversal
What makes a graph dynamic?
Time Structures
Perform static analysis on dynamic
components with time as a structure.
Dynamic Graphs
Multiple subgraphs representing the
graph state at a discrete timestep.
Keyphrases over Time
Natural Language Graph Analysis: Data Ingestion
Natural Language Graph Analysis: Data Modeling
Name: Baleen Keyphrase Graph
Number of nodes: 2,682,624
Number of edges: 46,958,599
Average degree: 35.0095
Name: Sampled Keyphrase Graph
Number of nodes: 139,227
Number of edges: 257,316
Average degree: 3.6964
def degree_filter(degree=0):
def inner(vertex):
return vertex.out_degree() > degree
return inner
g = gt.GraphView(g, vfilt=degree_filter(3))
Name: High Degree Phrase Graph
Number of nodes: 8,520
Number of edges: 112,320
Average degree: 26.366
Natural Language Graph Analysis: Data Wrangling
Basic Keyphrase Graph Information
Vertex Type Analysis
Primarily keyphrases and documents.
Degree Distribution
Power laws distribution of degree.
Natural Language Graph Analysis: Data Wrangling
def ego_filter(g, ego, hops=2):
def inner(v):
dist = gt.shortest_distance(g, ego, v)
return dist <= hops
return inner
# Get a random document
v = random.choice([
v for v in g.vertices()
if g.vp.type[v] == 'document'
])
ego = gt.GraphView(
g, vfilt=ego_filter(g,v, 1)
)
The Centrality of Time
Extract Week of the Year as Time Structure
# Construct Time Structures to Keyphrase
h = gt.Graph(directed=False)
h.gp.name = h.new_graph_property('string')
h.gp.name = "Phrases by Week"
# Add vertex properties
h.vp.label = h.new_vertex_property('string')
h.vp.vtype = h.new_vertex_property('string')
# Create graph from the keyphrase graph
for vertex in g.vertices():
if g.vp.type[vertex] == 'document':
dt = g.vp.pubdate[vertex]
weekno = dt.isocalendar()[1]
week = h.add_vertex()
h.vp.label[week] = "Week %d" % weekno
h.vp.vtype[week] = 'week'
for neighbor in vertex.out_neighbours():
if g.vp.type[neighbor] == 'phrase':
phrase = h.add_vertex()
h.vp.vtype[vidmap[phrase]] = 'phrase'
h.add_edge(week, phrase)
PageRank Centrality
A variant of Eigenvector
centrality that has a scaling
factor and prioritizes
incoming links.
Eigenvector Centrality
A measure of relative
influence where closeness
to important nodes matters
as much as other metrics.
Degree Centrality
A vertex is more important
the more connections it
has. E.g. β€œcelebrity”.
Betweenness Centrality
How many shortest paths
pass through the given
vertex. E.g. how often is
information flow through?
What are the central weeks and phrases?
Betweenness Centrality Katz Centrality
Keyphrase Dynamics
Create Sequences of Time Ordered Subgraphs
Animating Dynamics
Network Visualization
Layout: Edge and Vertex Positioning
Fruchterman
Reingold
SFDP (Yifan-Hu)
Force Directed
Radial Tree Layout
by MST
ARF Spring Block
Visual Properties of Vertices
Lane Harrison, The Links that Bind Us: Network Visualizations
http://blog.visual.ly/network-visualizations
Visual Properties of Edges
Lane Harrison, The Links that Bind Us: Network Visualizations
http://blog.visual.ly/network-visualizations
Visual Analysis
The Visual Analytics Mantra
Overview First Zoom and Filter Details on Demand
Questions?

More Related Content

What's hot

Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network SciencePavel Loskot
Β 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
Β 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
Β 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
Β 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningChetan Khatri
Β 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
Β 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysisFarah M. Altufaili
Β 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
Β 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageWilliam de Vazelhes
Β 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
Β 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
Β 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemCSCJournals
Β 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
Β 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
Β 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
Β 

What's hot (20)

Networkx tutorial
Networkx tutorialNetworkx tutorial
Networkx tutorial
Β 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
Β 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Β 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Β 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
Β 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
Β 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
Β 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Β 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Β 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
Β 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
Β 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible package
Β 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
Β 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
Β 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Β 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
Β 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
Β 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
Β 
MS Thesis
MS ThesisMS Thesis
MS Thesis
Β 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
Β 

Viewers also liked

Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
Β 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportBenjamin Bengfort
Β 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
Β 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
Β 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
Β 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Kyunghoon Kim
Β 
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2Kyunghoon Kim
Β 
Visualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityVisualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityCambridge Intelligence
Β 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
Β 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Kyunghoon Kim
Β 
Annotation with Redfox
Annotation with RedfoxAnnotation with Redfox
Annotation with RedfoxBenjamin Bengfort
Β 
Rasta processing of speech
Rasta processing of speechRasta processing of speech
Rasta processing of speechBenjamin Bengfort
Β 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
Β 
Network theory - PyCon 2015
Network theory - PyCon 2015Network theory - PyCon 2015
Network theory - PyCon 2015Sarah Guido
Β 
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...Birst
Β 
Solving graph problems using networkX
Solving graph problems using networkXSolving graph problems using networkX
Solving graph problems using networkXKrishna Sangeeth KS
Β 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
Β 
The five essential steps to building a data product
The five essential steps to building a data productThe five essential steps to building a data product
The five essential steps to building a data productBirst
Β 
Communities and dynamics in social networks
Communities and dynamics in social networksCommunities and dynamics in social networks
Communities and dynamics in social networksFrancisco Restivo
Β 

Viewers also liked (20)

Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Β 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Β 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Β 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Β 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
Β 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2
Β 
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
Β 
Visualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityVisualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber Security
Β 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
Β 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1
Β 
Annotation with Redfox
Annotation with RedfoxAnnotation with Redfox
Annotation with Redfox
Β 
Rasta processing of speech
Rasta processing of speechRasta processing of speech
Rasta processing of speech
Β 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Β 
Network theory - PyCon 2015
Network theory - PyCon 2015Network theory - PyCon 2015
Network theory - PyCon 2015
Β 
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
Β 
Solving graph problems using networkX
Solving graph problems using networkXSolving graph problems using networkX
Solving graph problems using networkX
Β 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
Β 
The five essential steps to building a data product
The five essential steps to building a data productThe five essential steps to building a data product
The five essential steps to building a data product
Β 
Communities and dynamics in social networks
Communities and dynamics in social networksCommunities and dynamics in social networks
Communities and dynamics in social networks
Β 
PROTEUS H2020
PROTEUS H2020 PROTEUS H2020
PROTEUS H2020
Β 

Similar to Dynamics in graph analysis (PyData Carolinas 2016)

Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
Β 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
Β 
3DRepo
3DRepo3DRepo
3DRepoMongoDB
Β 
Visualization of Big Data in Web Apps
Visualization of Big Data in Web AppsVisualization of Big Data in Web Apps
Visualization of Big Data in Web AppsEPAM
Β 
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)Oleksii Prohonnyi
Β 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
Β 
Map reduce hackerdojo
Map reduce hackerdojoMap reduce hackerdojo
Map reduce hackerdojonagwww
Β 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSudipta Mukherjee
Β 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
Β 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
Β 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And MiningSrinath Srinivasa
Β 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
Β 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
Β 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
Β 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
Β 
Everything is composable
Everything is composableEverything is composable
Everything is composableVictor Igor
Β 
Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)Sergio Gomez Villamor
Β 

Similar to Dynamics in graph analysis (PyData Carolinas 2016) (20)

Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
Β 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Β 
3DRepo
3DRepo3DRepo
3DRepo
Β 
Visualization of Big Data in Web Apps
Visualization of Big Data in Web AppsVisualization of Big Data in Web Apps
Visualization of Big Data in Web Apps
Β 
Introduction to D3.js
Introduction to D3.jsIntroduction to D3.js
Introduction to D3.js
Β 
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
Β 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Β 
Map reduce hackerdojo
Map reduce hackerdojoMap reduce hackerdojo
Map reduce hackerdojo
Β 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for All
Β 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Β 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Β 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
Β 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Β 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
Β 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
Β 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
Β 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
Β 
Everything is composable
Everything is composableEverything is composable
Everything is composable
Β 
For project
For projectFor project
For project
Β 
Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)
Β 

Recently uploaded

Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...amitlee9823
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...amitlee9823
Β 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
Β 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
Β 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
Β 
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...amitlee9823
Β 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
Β 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
Β 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Β 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
Β 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
Β 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
Β 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Delhi Call girls
Β 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Β 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
Β 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Β 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Β 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Β 
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Β 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Β 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Β 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Β 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Β 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Β 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
Β 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Β 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Β 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Β 

Dynamics in graph analysis (PyData Carolinas 2016)

  • 1. Dynamics in Graph Analysis Adding Time as Structure for Visual and Statistical Insight Benjamin Bengfort @bbengfort District Data Labs
  • 2. Are graphs effective for analytics? Or why use graphs at all?
  • 3. Algorithm Performance More understandable implementations and native parallelism provide benefits particularly to machine learning. Visual Analytics Humans can understand and interpret interconnection structures, leading to immediate insights.
  • 4. β€œGraph technologies ease the modeling of your domain and improve the simplicity and speed of your queries.” β€” Marko A. Rodriguez http://bit.ly/2cthd2L
  • 5. Construction Given a set of [paths, vertices] is a [constraint] graph construction possible? Existence Does there exist a [path, vertex, set] within [constraints]? Optimization Given several [paths, subgraphs, vertices, sets] is one the best? Enumeration How many [vertices, edges] exist with [constraints], is it possible to list them?
  • 8. How do you model time?
  • 11.
  • 13. Example of Time Filtered Traversal: Data Model Name: Emails Sent Network Number of nodes: 6,174 Number of edges: 343,702 Average degree: 111.339
  • 14. def sent_range(g, before=None, after=None): # Create filtering function based on date range. def inner(edge): if before: return g.ep.sent[edge] < before if after: return g.ep.sent[edge] > after return inner def degree_filter(degree=0): # Create filtering function based on min degree. def inner(vertex): return vertex.out_degree() > degree return inner Example of Time Filtered Traversal
  • 15. print("{} vertices and {} edges".format( g.num_vertices(), g.num_edges() )) # 6174 vertices and 343702 edges aug = sent_range(g, after=dateparse("Aug 1, 2016 09:00:00 EST") ) view = gt.GraphView(g, efilt=aug) view = gt.GraphView(view, vfilt=degree_filter()) print("{} vertices and {} edges".format( view.num_vertices(), view.num_edges() )) # 853 vertices and 24813 edges Example of Time Filtered Traversal
  • 16. What makes a graph dynamic?
  • 17. Time Structures Perform static analysis on dynamic components with time as a structure. Dynamic Graphs Multiple subgraphs representing the graph state at a discrete timestep.
  • 18.
  • 19.
  • 21. Natural Language Graph Analysis: Data Ingestion
  • 22. Natural Language Graph Analysis: Data Modeling Name: Baleen Keyphrase Graph Number of nodes: 2,682,624 Number of edges: 46,958,599 Average degree: 35.0095 Name: Sampled Keyphrase Graph Number of nodes: 139,227 Number of edges: 257,316 Average degree: 3.6964
  • 23. def degree_filter(degree=0): def inner(vertex): return vertex.out_degree() > degree return inner g = gt.GraphView(g, vfilt=degree_filter(3)) Name: High Degree Phrase Graph Number of nodes: 8,520 Number of edges: 112,320 Average degree: 26.366 Natural Language Graph Analysis: Data Wrangling
  • 24. Basic Keyphrase Graph Information Vertex Type Analysis Primarily keyphrases and documents. Degree Distribution Power laws distribution of degree.
  • 25. Natural Language Graph Analysis: Data Wrangling def ego_filter(g, ego, hops=2): def inner(v): dist = gt.shortest_distance(g, ego, v) return dist <= hops return inner # Get a random document v = random.choice([ v for v in g.vertices() if g.vp.type[v] == 'document' ]) ego = gt.GraphView( g, vfilt=ego_filter(g,v, 1) )
  • 26.
  • 28. Extract Week of the Year as Time Structure # Construct Time Structures to Keyphrase h = gt.Graph(directed=False) h.gp.name = h.new_graph_property('string') h.gp.name = "Phrases by Week" # Add vertex properties h.vp.label = h.new_vertex_property('string') h.vp.vtype = h.new_vertex_property('string') # Create graph from the keyphrase graph for vertex in g.vertices(): if g.vp.type[vertex] == 'document': dt = g.vp.pubdate[vertex] weekno = dt.isocalendar()[1] week = h.add_vertex() h.vp.label[week] = "Week %d" % weekno h.vp.vtype[week] = 'week' for neighbor in vertex.out_neighbours(): if g.vp.type[neighbor] == 'phrase': phrase = h.add_vertex() h.vp.vtype[vidmap[phrase]] = 'phrase' h.add_edge(week, phrase)
  • 29. PageRank Centrality A variant of Eigenvector centrality that has a scaling factor and prioritizes incoming links. Eigenvector Centrality A measure of relative influence where closeness to important nodes matters as much as other metrics. Degree Centrality A vertex is more important the more connections it has. E.g. β€œcelebrity”. Betweenness Centrality How many shortest paths pass through the given vertex. E.g. how often is information flow through?
  • 30. What are the central weeks and phrases? Betweenness Centrality Katz Centrality
  • 32. Create Sequences of Time Ordered Subgraphs
  • 35. Layout: Edge and Vertex Positioning Fruchterman Reingold SFDP (Yifan-Hu) Force Directed Radial Tree Layout by MST ARF Spring Block
  • 36. Visual Properties of Vertices Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
  • 37. Visual Properties of Edges Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
  • 38.
  • 40. The Visual Analytics Mantra Overview First Zoom and Filter Details on Demand